计算机集成制造系统 ›› 2018, Vol. 24 ›› Issue (第7): 1680-1689.DOI: 10.13196/j.cims.2018.07.009

• 当期目次 • 上一篇    下一篇

面向自然过程文本的案例信息抽取

倪维健,韦振胜,曾庆田+,刘彤   

  1. 山东科技大学计算机科学与工程学院
  • 出版日期:2018-07-31 发布日期:2018-07-31
  • 基金资助:
    国家自然科学基金资助项目(61602278,71704096,31671588);中国博士后科学基金资助项目(2014M561949);山东省重点研发计划资助项目(2016ZDJS02A11)。

Case information extraction from natural procedure text

  • Online:2018-07-31 Published:2018-07-31
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.61602278,71704096,31671588),the China Postdoctoral Science Foundation,China(No.2014M561949),and the Key Research and Development Foundation of Shandong Province,China(No.2016ZDJS02A11).

摘要: 针对描述具体流程案例信息的自然文本,提出一种案例自动抽取方法,实现了无结构过程文本向结构化事件日志转换,从而为后续的过程挖掘研究提供数据支持。首先对过程文本案例抽取任务进行了形式化描述,抽象出活动/属性实体识别、活动/属性关系识别、活动顺序关系识别3个核心任务,然后应用半监督统计学习技术分别设计了解决方法。选取中文菜谱文档为实例开展了大规模实验研究,对所提出的案例信息抽取方法的有效性进行了全面评估。实验结果表明,所提方法能够在少量人工标注数据的基础上有效利用同领域内大量未标注过程文本提升案例抽取效果,且无需人工设计复杂的规则,具有良好的领域适用性。

关键词: 案例, 信息抽取, 过程文本, 半监督学习

Abstract: Aiming at the natural text for describing specific process case,a new approach to automatically extracting case information was introduced,which realized the transformation from unstructured procedure text to structured event logs,and provided valuable data resource in downstream process mining tasks.Based on a formal definition of key concepts in procedure text,case information extraction was divided into three subtasks:activity/property entity recognition,activity/property relation recognition and activity sequence recognition,each of which was approached in a semi-supervised way.Experiments with Chinese recipe documents demonstrated that the proposed approach was capable of constructing case information extraction model by using a small amount of labeled text,with the help of the huge number of unlabeled text in the same domain.Furthermore,the proposed approach could be easily adapted to real-world domains as it was independent on complicated manually-designed rules.

Key words: case, information extraction, procedure text, semi-supervised learning

中图分类号: