南京大学学报(自然科学版) ›› 2023, Vol. 59 ›› Issue (4): 660–668.doi: 10.13232/j.cnki.jnju.2023.04.012

• • 上一篇    下一篇

基于图像边缘检测的扭曲文档矫正

徐远东1,2, 熊永平1,2(), 张铮1,2, 伍贵宾1,2, 张兴3, 王伟3   

  1. 1.北京邮电大学计算机科学与技术学院(国家示范性软件学院), 北京, 100876
    2.网络与交换技术国家重点实验室, 北京邮电大学, 北京, 100876
    3.华润数字科技有限公司, 广州, 518049
  • 收稿日期:2023-06-13 出版日期:2023-07-31 发布日期:2023-08-18
  • 通讯作者: 熊永平 E-mail:ypxiong@bupt.edu.cn
  • 基金资助:
    国网山东省电力公司科技项目(2023A?131)

Correction of distorted documents based on image edge detection

Yuandong Xu1,2, Yongping Xiong1,2(), Zheng Zhang1,2, Guibin Wu1,2, Xing Zhang3, Wei Wang3   

  1. 1.School of Computer Science and Technology (National Pilot Software Engineering School),Beijing University of Posts and Telecommunications,Beijing,100876,China
    2.State Key Laboratory of Networking and Switching Technology,Beijing University of Posts and Telecommunications,Beijing,100876,China
    3.China Resources Digital Co. , Ltd, Guangzhou,518049,China
  • Received:2023-06-13 Online:2023-07-31 Published:2023-08-18
  • Contact: Yongping Xiong E-mail:ypxiong@bupt.edu.cn

摘要:

扭曲的文档图像会干扰文档图像的光学字符识别(Optical Character Recognition,OCR).为了对扭曲形变的文档图像进行矫正,提高扭曲文档识别的正确率,基于目标检测与分割的网络,提出文档图像的边缘检测方法,使用贝塞尔(Bezier)曲线拟合文档图像的边缘曲线,通过目标检测的算法回归Bezier曲线的控制点.将文档图像的边缘检测转化为边缘曲线Bezier控制点的回归,使用文档的边缘点计算扭曲文档矫正后的矩形模板,然后将文档图像通过薄板样条插值(Thin Plate Spline,TPS)算法重映射到矩形模板中,完成文档的矫正.实验结果表明,提出的矫正方法能够对扭曲文档进行精确的边缘提取,和其他算法相比,经该算法矫正后的文档图像,其OCR的正确率有较大的提升.

关键词: 目标检测, 贝塞尔曲线, 文档图像矫正, 光学字符识别, 薄板样条插值

Abstract:

Distorted document images interfere with optical character recognition (OCR) of document images. To correct distorted document images and improve the correct rate of distorted document OCR recognition,this paper proposes an edge detection method for document images based on the object detection and segmentation network,uses Bezier curves to fit the edge curves of document images,and returns the control points of Bezier curves through the object detection algorithm. Convert the edge detection of the document image into the regression of Bezier control points of the edge curve,use the edge points of the document to calculate the rectified rectangular template of the distorted document,and then remap the document image to the rectangular template through the thin plate spline algorithm to complete the correction of the document. Experimental results show that the proposed correction method accurately extracts the edges of distorted documents. Compared with other algorithms,the corrected document image has a greater improvement in the accuracy of OCR.

Key words: object detection, Bezier curve, document image correction, optional character recognition, thin plate spline

中图分类号: 

  • TP391

图1

本文矫正算法的工作流程图"

图2

三阶Bezier曲线示意图"

图3

Bezier曲线标注文档图像边缘的示意图"

图4

MaskRCNN网络的结构图"

图5

区域建议网络的结构图"

图6

控制点回归演示图"

图7

改进的MaskRCNN算法的流程图"

表1

不同算法拟合图像边缘结果的对比"

模型RN
Mask0.8516
Mask+Bezier_20.9195
Mask+Bezier_4(1∶2)0.9543
Mask+Bezier_4(1∶1)0.9438
Mask+Bezier_4(1∶3)0.9407
Mask+Bezier_4(2∶1)0.9415
Mask+Bezier_4(3∶1)0.9388
Mask+Bezier_6(1∶2)0.9587
Mask+Bezier_8(1∶2)0.9618

图8

不同维度的Bezier曲线拟合效果的对比"

图9

文档图像矫正效果的对比"

表2

不同矫正算法的评价指标对比"

矫正算法CERSSIMLD
传统算法0.43720.421411.2
DocUNet0.42160.445610.4
DewarpNet0.32480.47729.2
Grid Regularization0.22780.49039.4
Ours0.33060.452710.1
Ours (Bezier)0.23650.49398.9
1 Cao H G, Ding X Q, Liu C S. A cylindrical surface model to rectify the bound document image∥Proceedings of the 9th IEEE International Conference on Computer Vision. Nice,France:IEEE,2003:228-233.
2 Koo H I, Kim J, Cho N I. Composition of a dewarped and enhanced document image from two view images. IEEE Transactions on Image Processing200918(7):1551-1562.
3 寇喜超,张鸿锐,冯杰,等. 基于多级文本检测的复杂文档图像扭曲矫正算法. 计算机科学202148(12):249-255.
Kou X C, Zhang H R, Feng J,et al. Distortion correction algorithm for complex document image based on multi?level text detection. Computer Science202148(12):249-255.
4 Kil T, Seo W, Koo H I,et al. Robust document image dewarping method using text-lines and line segments∥2017 14th IAPR International Conference on Document Analysis and Recognition. Kyoto,Japan:IEEE,2017:865-870.
5 Kanungo T, Haralick R M, Phillips I. Global and local document degradation models∥Proceedings of the 2nd International Conference on Document Analysis and Recognition. Tsukuba,Japan:IEEE,1993:730-734.
6 Wada T, Ukida H, Matsuyama T. Shape from shading with interreflections under proximal light source?3D shape reconstruction of unfolded book surface from a scanner image∥Proceedings of IEEE International Conference on Computer Vision. Cambridge,MA,USA:IEEE,1995:66-71.
7 Kim B S, Koo H I, Cho N I. Document dewarping via text?line based optimization. Pattern Recognition201548(11):3600-3614.
8 Zhang L, Zhang Y, Tan C. An improved physically?based method for geometric restoration of distorted document images. IEEE Transactions on Pattern Analysis and Machine Intelligence200830(4):728-734.
9 Brown M S, Seales W B. Document restoration using 3D shape:A general deskewing algorithm for arbitrarily warped documents∥Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver,Canada:IEEE,2001:367-374.
10 You S, Matsushita Y, Sinha S,et al. Multiview rectification of folded documents. IEEE Transactions on Pattern Analysis and Machine Intelligence201840(2):505-511.
11 Ma K, Shu Z X, Bai X,et al. DocUNet:Document image unwarping via a stacked U?Net∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA:IEEE,2018:4700-4709.
12 Das S, Ma K, Shu Z X,et al. DewarpNet:Single?image document unwarping with stacked 3D and 2D regression networks∥Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul,Korea (South):IEEE,2019:131-140.
13 Das S, Singh K Y, Wu J,et al. End?to?end piece?wise unwarping of document images∥Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal,Canada:IEEE,2021:4268-4277.
14 Feng H, Wang Y C, Zhou W G,et al. DocTr:Document image transformer for geometric unwarping and illumination correction∥Proceedings of the 29th ACM International Conference on Multimedia. Chengdu,China:ACM,2021:273-281.
15 Vaswani A, Shazeer N, Parmar N,et al. Attention is all you need. Advances in neural information processing systems∥Proceedings of the 31st Inter?national Conference on Neural Information Processing Systems. Long Beach,CA,USA:Curran Associates Inc.,2017:6000-6010.
16 Li X Y, Zhang B, Liao J,et al. Document rectification and illumination correction using a patch?based CNN. ACM Transactions on Graphics201938(6):168.
17 He K M, Gkioxari G, Dollár P,et al. Mask R?CNN∥Proceedings of the IEEE International Conference on Computer vision. Venice,Italy:IEEE,2017:2980-2988.
18 Jiang X W, Long R J, Xue N,et al. Revisiting document image dewarping by grid regularization∥Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans,LA,USA:IEEE,2022:4533-4542.
[1] 林灏昶, 秦云川, 蔡宇辉, 李肯立, 唐卓. 基于目标检测的图形用户界面控件识别方法[J]. 南京大学学报(自然科学版), 2022, 58(6): 1012-1019.
[2] 马学森, 马吉, 蒋功辉, 许雪梅, 周天保. 基于注意力机制和多尺度特征融合的绝缘子缺陷检测方法[J]. 南京大学学报(自然科学版), 2022, 58(6): 1020-1029.
[3] 孔 颉, 孙权森, 纪则轩, 刘亚洲. 基于仿射不变离散哈希的遥感图像快速目标检测新方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 49-60.
[4] 胡 太, 杨 明. 结合目标检测的小目标语义分割算法[J]. 南京大学学报(自然科学版), 2019, 55(1): 73-84.
[5]  丁轶,郭乔进,李宁**.  一种新的目标检测方法:Latent Dirichlet classification*
[J]. 南京大学学报(自然科学版), 2012, 48(2): 214-220.
[6]  高凯亮, 覃团发** , 陈跃波, 常侃 .  一种混合高斯背景模型下的像素分类运动目标检测方法*

[J]. 南京大学学报(自然科学版), 2011, 47(2): 195-200.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!