Computer Integrated Manufacturing System ›› 2022, Vol. 28 ›› Issue (10): 3175-3186.DOI: 10.13196/j.cims.2022.10.014

Previous Articles     Next Articles

iBelt:An interpretable cluster analysis method for event logs

LIU Wen1,2,WANG Guiling1,2+   

  1. 1.School of Information,North China University of Technology
    2.Beijing Municipal Key Laboratory on Integration and Analysis of Large-Scale Stream Data,North China University of Technology
  • Online:2022-10-31 Published:2022-11-10
  • Supported by:
    Project supported by the National Key Research and Development Program,China(No.2018YFB1402500),the Key Program of National Natural Science Foundation,China(No.61832004),and the International Cooperation and Exchange Program of National Natural Science Foundation,China(No.62061136006).

iBelt:一种事件日志的可解释聚类分析方法

刘雯1,2,王桂玲1,2+   

  1. 1.北方工业大学信息学院
    2.北方工业大学大规模流数据集成与分析技术北京市重点实验室
  • 基金资助:
    国家重点研发计划资助项目(2018YFB1402500);国家自然科学基金重点资助项目(61832004);国家自然科学基金国际(地区)合作与交流资助项目(62061136006)。

Abstract: When process mining based on complex event log is carried out,it is often necessary to cluster the event trace to simplify the structure of the process.However,most current trace clustering methods lack interpretability in the results which leads their application potential hampered a lot.For this reason,an interpretable cluster analysis method for event logs called iBelt was proposed.A “process connection belt” was defined to describe the analysis results of event logs.Based on the idea of clustering tree,the model of Clustering through Boosting Decision Tree (CLBDT) was designed,and the unsupervised feature selection method of variance and discriminant feature analysis was adopted to improve the clustering effect and fitting degree of existing methods,and solve the disadvantage of high-dimensional data affecting the interpretability of “process connection belt”.Experimental results on the public dataset showed that the resulted process connection belt had simple and easy to understand interpretable rules,and the quality of the associate process models had been improved.

Key words: process mining, trace clustering, interpretable clustering, decision tree

摘要: 鉴于当前大多数方法因在日志聚类结果上缺乏可解释性而影响应用,提出一种事件日志的可解释聚类分析方法iBelt。该方法定义“过程连接带”描述事件日志的分析结果,基于聚类树思想设计了提升聚类树模型,并采用方差和判别特征分析的无监督特征选择方法提升已有方法的聚类效果和拟合度,解决了高维数据影响过程连接带可解释性的弊端。通过公开数据集上的实验结果表明,所提方法分析得到的过程连接带具有简洁易懂的可解释规则,提升了对应过程模型的质量。

关键词: 过程挖掘, 轨迹聚类, 可解释性聚类, 决策树

CLC Number: