Computer Integrated Manufacturing System ›› 2022, Vol. 28 ›› Issue (10): 3166-3174.DOI: 10.13196/j.cims.2022.10.013

Previous Articles     Next Articles

Local log sampling method for process discovery

NI Ke1,YU Dongjin1+,SUN Xiaoxiao1,HU Hua2   

  1. 1.School of Computer Science and Technology,Hangzhou Dianzi University
    2.Hangzhou Normal University
  • Online:2022-10-31 Published:2022-11-10
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.61702144),the Industrial Internet Innovation and Development Project of Ministry of Industry and Information Technology,China(No.TC200802G,TC2008033),the Zhejiang Provincial Key Research and Development Program,China(No.2020C01165),and the Natural Science Foundation of Zhejiang Province,China(No.LQ20F020017).

用于流程发现的局部日志采样方法

倪可1,俞东进1+,孙笑笑1,胡华2   

  1. 1.杭州电子科技大学计算机学院
    2.杭州师范大学
  • 基金资助:
    国家自然科学基金资助项目(61702144);工信部工业互联网创新发展工程资助项目(TC200802G,TC2008033);浙江省重点研发计划资助项目(2020C01165);浙江省自然科学基金资助项目(LQ20F020017)。

Abstract: To address the performance bottleneck of traditional process discovery algorithms in processing large-scale event logs,a log sampling method based on trace incremental information was proposed.This method quantified the directly follow relationship between events and the feature information of traces,takes whether a trace carries a new process behavior as the sampling criterion,and determined the minimum number of consecutive traversal samples based on statistical theory.To further improve the preprocessing speed,a binary exponential skip algorithm was proposed to avoid the scanning of duplicate traces.Experiments on four real-life event logs showed that the proposed sampling method could quickly and efficiently reduce the size of event logs and retain critical control flow and frequency information,while improving the running speed of process discovery algorithm.

Key words: process discovery, log sampling, event log, incremental information, process model

摘要: 针对传统流程发现算法在处理大规模事件日志时的性能瓶颈问题,提出一种基于轨迹信息增量的日志采样方法,通过量化事件之间的直接跟随关系和轨迹的特征信息,将轨迹是否带有新的流程行为作为采样标准,基于统计理论确定了最小连续遍历样本数量。为了进一步提高预处理速度,提出二进制指数跳跃算法来避免扫描重复轨迹。通过4个真实事件日志的实验表明,所提采样方法可以快速有效地缩小事件日志的规模,并保留关键的控制流和频率信息,同时提高流程发现算法的运行速度。

关键词: 流程发现, 日志采样, 事件日志, 信息增量, 流程模型

CLC Number: