计算机集成制造系统 ›› 2020, Vol. 26 ›› Issue (6): 1510-1524.DOI: 10.13196/j.cims.2020.06.008

• 当期目次 • 上一篇    下一篇

基于轨迹聚类种群的遗传过程混成挖掘算法

汤雅惠1,朱锐2,3+,李彤2,4,南峰涛3,郑明1,马自飞3   

  1. 1.云南大学信息学院
    2.云南省软件工程重点实验室
    3.云南大学软件学院
    4.云南农业大学大数据学院
  • 出版日期:2020-06-30 发布日期:2020-06-30
  • 基金资助:
    国家自然科学基金资助项目(61662085,61662065);云南省自然科学基金基础研究面上资助项目(2019FB135);云南大学数据驱动的软件工程省科技创新团队资助项目(2017HC012);云南大学“东陆中青年骨干教师”培养计划资助项目;云南大学教育厅科学研究基金研究生类资助项目(2019Y008);云南大学研究生科研创新基金资助项目(2019158)。

Genetic process hybrid mining algorithm based on trace clustering population

  • Online:2020-06-30 Published:2020-06-30
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.61662085,61662065),the Yunnan Provincial Natural Science Foundation Fundamental Research Project,China(No.2019FB135),the Yunnan University Data-Driven Software Engineering Provincial Science and Technology Innovation Team Project,China(No.2017HC012),the Yunnan University “Dong Lu Young-backbone Teacher” Training Program,China,the Yunnan University Education Department Science Research Fund Graduate Program,China(No.2019Y008),and the Yunnan Universitys Research Innovation Fund for Graduate Students,China(No.2019158).

摘要: 遗传过程挖掘算法以模型质量引导模型的发现,在挖掘模型的同时不断修正挖掘算法的执行,因此相比于其他挖掘算法,更容易生成高质量的过程模型。但由于其迭代发现的特性,对于大型日志,挖掘效率往往较低且生成模型质量不高。针对以上问题,提出一种基于轨迹聚类种群的遗传过程混成挖掘算法(GMTC)。该算法一方面通过轨迹聚类划分事件日志,简化挖掘环境,再使用归纳挖掘算法对事件日志进行预挖掘,为遗传挖掘算法准备高质初始种群;另一方面优化遗传算子,使用对齐日志得到的模型偏差信息指导突变操作,使得突变操作由随机变为有向,从而有效地提高种群的综合质量,使遗传挖掘算法加快收敛。基于过程日志生成器生成模拟日志、某市政府建筑许可申请过程的真实日志以及6个公开数据集的实验结果表明:基于轨迹聚类种群的遗传过程混成挖掘算法相较于其他挖掘算法不但在挖掘效率方面有较大提升,而且挖掘得到的模型质量也能够达到较高的水平。

关键词: 过程挖掘, 归纳挖掘算法, 轨迹聚类, 遗传过程挖掘算法

Abstract: Genetic process mining algorithm uses model quality guide the model mining,continuously optimize the model while mining the model.Therefore,it is easier to generate a high-quality process model by comparing to other mining algorithms.However,its mining efficiency is extremely low for large event logs due to the characteristics of iterative discovery.To solve above problems,Genetic process hybrid Mining algorithm based on Trace Clustering population (GMTC) was proposed.GMTC divided the event log by trace clustering,which could simplify the mining environment.Inductive Miner (IM) algorithm was used to prepare high-quality initial population for genetic mining algorithm.Genetic operators had been optimized using the model deviation information so that the mutation operation changed from random to directed,which could improve the comprehensive quality of the population effectively.Based on the PLG generated simulation log,the real log of a municipal government building permit application process and six public data sets,the experimental results showed that the proposed algorithm had a better improvement in both mining efficiency and model quality.

Key words: process mining, inductive miner algorithm, trace cluster, genetic process mining algorithm

中图分类号: