Computer Integrated Manufacturing System ›› 2022, Vol. 28 ›› Issue (10): 3256-3264.DOI: 10.13196/j.cims.2022.10.021

Previous Articles     Next Articles

Privacy protection method for process mining based on genetic algorithm

GAO Juntao,YAN Shenyi   

  1. School of Computer and Information Technology,Northeast Petroleum University
  • Online:2022-10-31 Published:2022-11-10
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.51774090,61902222),the Guidance Science and Technology Planning of Daqing City,China(No.zd-2019-22),and the Excellent Young and Middle-aged Innovative Team Cultivation Foundation of Northeast Petroleum University,China(No.KYCXTDQ202101).

基于遗传算法的过程挖掘隐私保护方法

高俊涛,闫駪艺   

  1. 东北石油大学计算机与信息技术学院
  • 基金资助:
    国家自然科学基金资助项目(51774090,61902222);大庆市指导性科技计划资助项目(zd-2019-22);东北石油大学优秀中青年科研创新团队培育基金资助项目(KYCXTDQ202101)。

Abstract: To reduce the utility loss caused by privacy protection in process mining,an anonymous privacy protection method for event logs was proposed based on genetic algorithm.The necessary conditions and reduction rules of the feasible activity suppression set were proved based on the minimal violating trace to reduce algorithms search space for large event logs.A fitness function was designed based on the order relationship of the logs,trace variants and the maximal frequent trace to guide the population to evolve to high utility.The cross and mutation operator were improved to guarantee the diversity of the population,so that the genetic algorithm could stay away from local optimization.Comparative experiments were conducted on real event logs,and the results showed that the proposed method outperformed TLKC and the baseline method under the same privacy protection strength.

Key words: process mining, privacy protection, genetic algorithms, event logs

摘要: 为减少过程挖掘中隐私保护带来的效用损失,提出一种基于遗传算法的事件日志匿名化隐私保护方法。基于最小违规轨迹证明活动抑制集可行性的必要条件和约简规则,缩小大型日志集的搜索空间;根据基于日志的次序关系、轨迹变体和最大频繁轨迹设计适应度函数,引导种群向高效用性进化;改进交叉变异算子,保证种群多样性,避免遗传算法陷入局部最优。在真实事件日志上进行对比实验,结果表明在隐私保护强度相同的条件下,所提方法的效用损失比TLKC和基线方法生成的事件日志更低。

关键词: 过程挖掘, 隐私保护, 遗传算法, 事件日志

CLC Number: