计算机集成制造系统 ›› 2020, Vol. 26 ›› Issue (6): 1483-1491.DOI: 10.13196/j.cims.2020.06.005

• 当期目次 • 上一篇    下一篇

基于信息熵的无标日志划分评价方法

林雷蕾1,杨良2,闻立杰1+,周华3,王建民1   

  1. 1.清华大学软件学院
    2.浪潮通用软件有限公司
    3.西南林业大学大数据与智能工程学院
  • 出版日期:2020-06-30 发布日期:2020-06-30
  • 基金资助:
    国家重点研发计划资助项目(2017YFA0700605);国家自然科学基金资助项目(61472207,71690231);北京信息科学与技术国家研究中心资助项目。

Evaluation method for log partition without ground truthbased oninformation entropy

  • Online:2020-06-30 Published:2020-06-30
  • Supported by:
    Project supported by the National Key Research and Development Plan,China(No.2017YFA0700605),the National Natural Science Foundation,China(No.61472207,71690231),and the BNRist,China.

摘要: 为提升模型发现的质量,可以利用日志划分将原始日志数据划分为多个子日志。现有日志划分的评价方法基本采用有标的方式来衡量划分的质量,而实际生活中很难获取到有标的日志数据。为此,提出划分熵作为无标日志划分的衡量标准。首先,定义轨迹变体用于刻画每个子日志的分布情况。其次,提出内部熵和外部熵来分别刻画子日志的内聚度和差异性。然后,利用惩罚因子对盲目迎合评价指标的划分方法进行惩罚。最后,将以上内容进行融合,形成划分熵的表达式。实验结果表明了所提方法的可行性。

关键词: 过程挖掘, 日志划分, 信息熵, 轨迹聚类

Abstract: To improve the process discovery,log partitionis is used to divide the raw log data into multiple sub-logs.The existing methods for evaluating log partition are with ground truth,but it is difficult to obtain the marked log data in real life.For this reason,the partition entropy was proposed as a measure of log partition evaluation without ground truth.The trace variants were defined to depict the distribution of each sub-log.The internal entropy and external entropy were proposed to respectively describe the cohesion and divergence among those sub-logs.The penalty factor was used to punish some evaluation methods those blindly catering to the standard of high cohesion and low coupling.The equation of partition entropy was proposed based on internal entropy,external entropy and penalty factor.Experimental results showed the feasibility of the proposed method.

Key words: process mining, log partition, information entropy, trace clustering

中图分类号: