Computer Integrated Manufacturing System ›› 2024, Vol. 30 ›› Issue (8): 2832-2843.DOI: 10.13196/j.cims.2023.BPM20

Previous Articles     Next Articles

Event log sampling approach towards directly-follows relation rediscoverability and its application

SU Xuan1,LIU Cong1,2+,WEN Lijie3,MENG Xiaoliang1,LI Caihong1,ZENG Qingtian2   

  1. 1.School of Computer Science and Technology,Shandong University of Technology
    2.College of Computer Science and Engineering,Shandong University of Science and Technology
    3.School of Software,Tsinghua University
  • Online:2024-08-31 Published:2024-09-05
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.62472264),the Taishan Scholars Program of Shandong Province,China(No.ts20190936,tsqn201909109),the Natural Science Excellent Youth Foundation of Shandong Province,China(No.ZR2021YQ45),and the Youth Innovation Science and Technology Team Foundation of Shandong Provincial Universities,China(No.2021KJ031).

面向紧邻关系重发现的事件日志采样方法及其应用

苏轩1,刘聪1,2+,闻立杰3,孟晓亮1,李彩虹1,曾庆田2   

  1. 1.山东理工大学计算机科学与技术学院
    2.山东科技大学计算机科学与工程学院
    3.清华大学软件学院
  • 作者简介:
    苏轩(1999-),女,山东青岛人,硕士研究生,研究方向:流程挖掘等,E-mail:15715325632@163.com;

    +刘聪(1990-),男,山东淄博人,教授,博士,博士生导师,研究方向:流程挖掘、业务流程管理、人工智能等,通讯作者,E-mail:liucongchina@163.com;

    闻立杰(1977-),男,河北唐山人,副教授,博士生导师,研究方向:流程挖掘、业务流程管理、工作流技术等,E-mail:wenlj@163.com;

    孟晓亮(1988-),男,山东潍坊人,讲师,博士,研究方向:视觉检测与图像处理、深度学习等,E-mail:xiaoliang@sdut.edu.cn;

    李彩虹(1970-),女,山东招远人,教授,博士,研究方向:计算机应用技术、智能移动机器人控制技术、人工智能信息处理,E-mail:lich@sdut.edu.cn;

    曾庆田(1976-),男,山东高密人,教授,博士,博士生导师,研究方向:流程挖掘、业务流程管理、Petri网等,E-mail:qtzeng@163.com。
  • 基金资助:
    国家自然科学基金资助项目(62472264);山东省泰山学者工程专项基金资助项目(ts20190936,tsqn201909109);山东省自然科学基金优秀青年基金资助项目(No.ZR2021YQ45);山东省高等学校青创科技计划创新团队项目(2021KJ031)。

Abstract: As a new research hotspot in the field of process mining in recent years,event log sampling aims to improve the efficiency of process mining tasks,such as model discovery,conformance checking,process prediction,etc.However,the existing sampling methods cannot guarantee the quality of the mining model well,and the sampling efficiency for large-scale event logs is low.As the basic unit of behavior description in event logs,task directly-follows relation plays a key role in various process mining tasks.So a general sampling method towards directly-follows relation rediscoverability was proposed,which could ensure the directly-follows relation rediscoverability.To verify the effectiveness of this sampling method,it was applied to improve the efficiency of model mining.To quantitatively evaluate the quality of mining models,a model similarity evaluation based on process tree was proposed.The sampling method had been implemented in the open source process mining tool platform ProM6 and PM4PY platform.Based on 12 public event log datasets,a quantitative comparison was made between the proposed sampling method and existing sampling methods in terms of model mining quality.Experiments showed that the proposed event log sampling method towards directly-follows rediscoverability could greatly improves the log sampling efficiency on the premise of ensuring the quality of model.

Key words: event log sampling, directly-follows relation rediscoverbility, quality measure, model similarity

摘要: 事件日志采样作为近年来流程挖掘领域一个新的研究热点,旨在提高流程挖掘任务的效率,如模型发现、合规性检查、流程预测等。然而目前已有的采样方法不能很好地保证挖掘模型的质量,且针对大规模事件日志的采样效率低。任务紧邻关系作为事件日志中行为描述的基本单元,在各类流程挖掘任务中起到了关键作用。鉴于此,提出了一个通用的面向紧邻关系重发现的事件日志采样方法,该方法可保证紧邻关系的重发现性。为了验证该采样方法的有效性,将其应用于提高已有模型挖掘算法的效率,为了对挖掘模型质量定量评估,提出了基于流程树的模型相似度方法。所提出的采样方法已在开源流程挖掘工具平台ProM6和PM4PY实现,基于12个公开事件日志数据集,将所提出的面向紧邻关系重发现的采样方法与已有方法从模型挖掘质量方面进行了定量比较,实验结果表明所提方法可以在保证模型质量的前提下,大幅提高模型发现效率。

关键词: 事件日志采样, 紧邻关系重发现, 质量评估, 模型相似度

CLC Number: