Computer Integrated Manufacturing System ›› 2025, Vol. 31 ›› Issue (10): 3872-3883.DOI: 10.13196/j.cims.2024.0549

Previous Articles     Next Articles

Accurate matching method based on improved WRD under patent big data

LI Kunping1,LIU Jianhua1,2,PEI Fengque1,ZHANG Jin1,ZHUANG Cunbo1,2+   

  1. 1.School of Mechanical Engineering,Beijing Institute of Technology
    2.Key Lab of Intelligent Assembly and Detection Technology of Hebei Province,Tangshan Research Institute
  • Online:2025-10-31 Published:2025-11-19

专利大数据环境下基于改进WRD的精准匹配方法

李坤平1,刘检华1,2,裴凤雀1,张晋1,庄存波1,2+   

  1. 1.北京理工大学机械与车辆学院
    2.北京理工大学唐山研究院河北省智能装配与检测技术重点实验室
  • 作者简介:
    李坤平(1975-),男,重庆人,博士研究生,研究方向:文本挖掘、知识图谱,E-mail:3220205050@bit.edu.cn;

    刘检华(1977-),男,江西萍乡人,教授,博士,博士生导师,研究方向:数字化装配技术,E-mail:jeffliu@bit.edu.cn;

    裴凤雀(1990-),男,河北石家庄人,博士后,博士,研究方向:制造系统协同优化,E-mail:fq_pei@163.com;

    张晋(1996-),男,北京人,硕士研究生,研究方向:装配质量管控,E-mail:3120210404@bit.edu.cn;

    +庄存波(1991-),男,江西高安人,副研究员,博士,硕士生导师,研究方向:装配MES、数字孪生技术,通讯作者,E-mail:zhuangdavid@bit.edu.cn。

Abstract: The existing patent inventory capacity in China is large,but the lack of efficient and accurate matching processing technology has hindered the further improvement of the patent conversion rate.To solve this problem,the natural language processing was introduced to propose an accurate matching technology in the patented big data environment.Each provincial patents data was distributed storage in the Hadoop File Systems (HDFS),and the distributed parallel processing architecture was used to improve the processing performance.In addition,the improved Word Rotator's Distance(WRD)algorithm was used,and the traditional bidirectional movement was re-defined as the movement from the smaller side to the larger total weight by restricting the direction of word shift process.The objective function was modified by considering a penalty term,which was the cosine similarity of the two total weight.By dropping the improved WRD,the computational complexity of total weight was reduced and the accuracy of the natural language matching was improved,which provided an effective method on accurate matching under the patent big data.

Key words: patented big data, natural language processing, word rotator's distance, accurate matching

摘要: 当前我国专利数量巨大,但由于缺少高效精准的专利匹配处理方法,阻碍了专利转化率的进一步提升。为此,引入自然语言处理技术,提出了一种专利大数据环境下精准匹配方法。首先,采用基于Hadoop的分布式存储和并行计算架构,将全国专利按省份分布式存储,并利用并行处理架构实现批量化计算,提升处理性能;其次,采用改进的词移距离算法(WRD)算法,通过约束词移过程方向,将传统双向移动,定义为由总权值较小的一方向较大的一方移动,并引入余弦相似度作为惩罚项对目标函数进行修正,降低总权值计算复杂度。最后,通过实例验证了该方法的运算性能和专利匹配准确性。结果表明所提方法为解决海量专利数据的高效可靠匹配提供了一种有效的途径。

关键词: 专利大数据, 自然语言处理, 词移距离算法, 精准匹配

CLC Number: