Computer Integrated Manufacturing System ›› 2022, Vol. 28 ›› Issue (6): 1755-1766.DOI: 10.13196/j.cims.2022.06.014

Previous Articles     Next Articles

Processing algorithm of irregular table based on image statistics clustering method

  

  • Online:2022-06-30 Published:2022-07-05
  • Supported by:
    Project supported by the key Research and Development Program of Shaanxi Provincial Science and Technology Department,China(No.2019GY-065),the Science and Technology Program of Xian City,China(No.2020KJRC0033),and the Science and Technology Program of Weiyang District Science and Technology Department of Xian City,China(No.201923).

基于图像统计学聚类的非规则表格处理算法

吕志刚1,2,李亮亮1,王洪喜1,王鹏2+,李晓艳2   

  1. 1.西安工业大学机电工程学院
    2.西安工业大学电子信息工程学院
  • 基金资助:
    陕西省科技厅重点研发计划资助项目(2019GY-065);西安市科技计划资助项目(2020KJRC0033);西安市未央区科技计划资助项目(201923)。

Abstract: Mechanical table files such as mechanical process card,parts test report widely exist in production,manufacturing and other fields.Digitalizing such traditional printing files and extracting table,text and other information is an important step for managing mechanical products effectively.The existing methods are still very deficient on irregular tables with discontinuity of longitudinal lines,dislocation of frame lines and other irregular phenomena.Aiming at the irregular tables with discontinuity of longitudinal lines and dislocation of frame lines,a clustering-table recognition and segmentation algorithm based on image statistics was proposed to improve adaptation and robustness.To solve the cross-page problem of incomplete tables,an algorithm of cross-page stitching based on the percentage of pixels in a local small area was proposed.Digital reproduction of irregular tables was realized based on the above two steps.By iteratively testing the existing 147 digitized table reports (resolution:75dpi-400dpi),the accuracy of positioning,segmentation and stitching for irregular tables could reach 97.32%.The experiment results proved the effectiveness of the proposed method.

Key words: projection statistics, clustering, table-node extraction, cross-page splicing, digital reproduction

摘要: 机械表格类档案广泛存在于生产、制造等领域,包括机械工艺卡、零部件测试报表等。纸质版机械表格类档案数字化,以及对表格中的表格、文字等有效元素进行提取,是有效管理机械类产品的关键内容。现有表格有效元素提取算法成果较为丰富,但对存在纵向框线为虚线、纵向框线错位、连续跨页等现象的非规则表格处理效果不佳。针对纵向直线不连续、框线错位的表格,提出一种图像统计学聚类表格识别分割算法,提高了算法的自适应能力及鲁棒性;然后针对表格跨页问题提出一种基于局部小区域内像素占比的跨页拼接算法,最终实现了机械产品中非规则表格的数字化复现。对现有的147张存在非规则现象的机械零部件测试报表进行数字化(分辨率为75dpi~400dpi)后,利用所提算法进行迭代测试,非规则表格的定位分割拼接准确率可达9732%。实验证明了该方法的有效性。

关键词: 投影统计, 聚类, 表格交点提取, 跨页拼接, 数字化复现

CLC Number: