Computer Integrated Manufacturing System ›› 2024, Vol. 30 ›› Issue (2): 623-634.DOI: 10.13196/j.cims.2021.0556

Previous Articles     Next Articles

LogRank++:An efficient business process event log sampling approach

LIU Cong1,2,ZHANG Shuaipeng1,LI Huiling1,HE Hua1,ZENG Qingtian2+   

  1. 1.School of Computer Science and Technology,Shandong University of Technology
    2.College of Computer Science and Engineering,Shandong University of Science and Technology
  • Online:2024-02-29 Published:2024-03-07
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.61902222),the Taishan Scholars Program of Shandong Province,China(No.ts20190936,tsqn201909109),the Shandong Provincial Natural Science Foundation Outstanding Youth Fund,China(No.ZR2021YQ45),the Innovation Team Project of Qingchuang Science and Technology Plan of Colleges and Universities in Shandong Province,China(No.2021KJ031),and the Program of Leading Talents and Excellent Research Team of Shandong University of Science and Technology,China(No.2015TDJH102).

LogRank++:一种高效的业务过程事件日志采样方法

刘聪1,2,张帅鹏1,李会玲1,何华1,曾庆田2+   

  1. 1.山东理工大学计算机科学与技术学院
    2.山东科技大学计算机科学与工程学院
  • 基金资助:
    国家自然科学基金资助项目(61902222);山东省泰山学者工程专项基金资助项目(ts20190936,tsqn201909109);山东省自然科学基金优秀青年基金资助项目(ZR2021YQ45);山东省高等学校青创科技计划创新团队资助项目(2021KJ031);山东科技大学领军人才与优秀科研团队计划资助项目(2015TDJH102)。

Abstract: Existing event log sampling methods have the problem of low sampling efficiency when processing large-scale event logs collected by information systems.Therefore,a novel efficiently log sampling approach called as LogRank++ was proposed.The important characteristics of the trace were first confirmed,such as activity and direct follow relation.Then,the importance value of the calculated trace was sorted,and a group of the most important trace was selected to form a sample log.In addition,the efficiency of sampling technology was evaluated from two aspects: sampling quality and sampling efficiency.The proposed sampling approach had been implemented in the open-source process mining toolkit ProM.Experimental evaluation with both synthetic and real-life event logs demonstrated that the proposed sampling approach provided an effective solution to improve event log sampling efficiency as well as ensuring high quality of the obtained sample logs from a process discovery perspective.

Key words: LogRank++, log sampling, process discovery, quality measurement

摘要: 针对已有采样方法处理大规模事件日志时存在采样效率低的问题,提出一种高效的业务过程事件日志采样方法LogRank++。首先确定轨迹的重要性特征,然后对计算轨迹的重要性值进行排序,最后选择一组最重要的轨迹组成样本日志。综合采样质量和采样效率两方面来评估此采样方法的高效性。所提采样方法已在开源过程挖掘工具平台ProM中实现。实验分析表明,相比已有采样方法,在保证样本日志质量的前提下,LogRank++能够大幅提高日志采样效率。

关键词: 日志排序, 日志采样, 过程发现, 质量评估

CLC Number: