Computer Integrated Manufacturing System ›› 2024, Vol. 30 ›› Issue (9): 3199-3207.DOI: 10.13196/j.cims.2023.0085

Previous Articles     Next Articles

Trace clustering sampling framework for heterogeneous event logs

ZHANG Shuaipeng1,LIU Cong1,2+,SU Xuan1,GUO Na1,GAO Qingxin1,LI Caihong1,ZENG Qingtian2   

  1. 1.School of Computer Science and Technology,Shandong University of Technology
    2.College of Computer Science and Engineering,Shandong University of Science and Technology
  • Online:2024-09-30 Published:2024-10-09
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.62472264,52374221),the Taishan Scholars Program of Shandong Province,China(No.tsqn201909109,ts20190936),the Natural Science Excellent Youth Foundation of Shandong Province,China(No.ZR2021YQ45),and the Youth Innovation Science and Technology Team Foundation of Shandong Provincial Higher School,China(No.2021KJ031).

面向异质事件日志的轨迹聚类采样框架

张帅鹏1,刘聪1,2+,苏轩1,郭娜1,高庆鑫1,李彩虹1,曾庆田2   

  1. 1.山东理工大学计算机科学与技术学院
    2.山东科技大学计算机科学与工程学院
  • 作者简介:
    张帅鹏(1997-),男,河南许昌人,硕士研究生,研究方向:过程挖掘等,E-mail:15994069715@163.com;

    +刘聪(1990-),男,山东淄博人,教授,博士,博士生导师,研究方向:业务过程管理、过程挖掘、Petri网理论与应用等,通讯作者,E-mail:liucongchina@163.com;

    苏轩(1999-),女,山东青岛人,硕士研究生,研究方向:过程挖掘等,E-mail:su_xuan2021@163.com;

    郭娜(1996-),女,山东淄博人,博士研究生,研究方向:业务过程管理等,E-mail:guona_7 @163.com;

    高庆鑫(1999-),男,山东德州人,硕士研究生,研究方向:过程挖掘等,E-mail:gaoqingxin@163.com;

    李彩虹(1970-),女,山东招远人,教授,博士,博士生导师,研究方向:计算机应用技术、智能移动机器人控制技术、人工智能信息处理等,E-mail:lich@sdut.edu.cn;

    曾庆田(1976-),男,山东高密人,教授,博士,博士生导师,研究方向:过程挖掘、业务过程管理、Petri网等,E-mail:qtzeng@sdust.edu.cn。
  • 基金资助:
    国家自然科学基金资助项目(62472264,52374221);山东省泰山学者工程专项基金资助项目(tsqn201909109,ts20190936);山东省自然科学基金优秀青年基金资助项目(ZR2021YQ45);山东省高等学校青创科技计划创新团队资助项目(2021KJ031)。

Abstract: Considerable amounts of business process event logs are collected by information systems,process discovery aims to discover process models from event logs to provide evidence for business process improvement.Existing process discovery approaches have performance bottlenecks when handling large-scale event logs,event log sampling technology provides an effective solution for improving the efficiency of process discovery.Existing event log sampling techniques usually assume that the log is homogeneous,that is,the log comes from or corresponds to a single business process.However,considering the complexity and dynamic changes of business,the traces in the same event log usually show the characteristics of heterogeneity,that is,the log comes from or corresponds to multiple business processes with different behaviors.In the face of heterogeneous event logs,the sample logs obtained by existing sampling techniques have problems such as low accuracy.To address to this challenge,a trace clustering sampling framework for heterogeneous event logs was proposed.The event log was decomposed into a set of sub-logs by trace clustering method,the sub-logs were sampled respectively by existing sampling methods,and then the sampled logs were merged to obtain the final sample log,finally the quality of the sample log was evaluated from the perspective of process model mining.Experimental evaluation with 6 public datasets demonstrated that the proposed method provided an effective solution for high-quality sampling of heterogeneous event logs.

Key words: heterogeneity, trace clustering, log sampling, process discovery, quality measurement

摘要: 信息系统在执行过程中收集了大量的业务流程事件日志,流程发现旨在从事件日志中发现流程模型,从而为改进提供事实依据。已有流程发现方法在处理大规模事件日志时仍存在性能瓶颈,事件日志采样技术为提高流程发现的效率提供了一种有效方案。已有事件日志采样方法通常假定日志是同质的,即日志来源于或对应单一的业务流程。然而,考虑到业务流程的复杂性和动态变化,同一事件日志中的轨迹通常呈现出异质的特点,即日志来源于或对应多个行为差异的业务流程。在处理异质事件日志时,通过已有采样技术得到的样本日志存在精度低等问题,而事件日志轨迹聚类却能很好地处理这一问题。由此提出一种面向异质事件日志的轨迹聚类采样框架,首先将事件日志通过轨迹聚类方法分解为一组同质的子日志;其次,通过已有采样方法对子日志进行日志采样;然后,将子日志对应的样本日志进行合并作为最终的样本日志;最后,从流程模型挖掘的角度对样本日志的质量进行评估。通过6个公开数据集的实验分析表明,所提方法为异质事件日志的高质量采样提供了一种有效的解决方案。

关键词: 异质性, 轨迹聚类, 日志采样, 流程发现, 质量评估

CLC Number: