计算机集成制造系统 ›› 2017, Vol. 23 ›› Issue (第5期): 1069-1079.DOI: 10.13196/j.cims.2017.05.018

• 产品创新开发技术 • 上一篇    下一篇

ProBench:一种评估流程相似性查询算法的基准数据集

曹斌,王佳星,安卫士,范菁+,程时伟   

  1. 浙江工业大学计算机科学与技术学院
  • 出版日期:2017-05-31 发布日期:2017-05-31
  • 基金资助:
    国家自然科学基金资助项目(61602411,61272308);浙江省自然科学基金资助项目(LY15F020030);浙江省重大科技专项重点工业资助项目(2015C01034,2015C01029);杭州市重大科技创新资助项目(20152011A03)。

ProBench:a benchmark dataset for evaluating the process similarity search methods

  • Online:2017-05-31 Published:2017-05-31
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.61602411,61272308),the Natural Science Foundation of Zhejiang Province,China(No.LY15F020030),the Key Research and Development Project of Zhejiang Province,China(No.2015C01034,2015C01029),and the Major Science and Technology Innovation Project of Hangzhou City,China(No.20152011A03).

摘要: 针对目前缺乏评估现有流程相似性算法性能的基准数据集问题,在IBM公开的数据集基础上,用Petri网建模流程模型,提出一种评估流程相似性查询算法的基准数据集。该数据集由100个流程模型组成,其中标记出了10个检索流程与其各自的9个相关流程,以及相关流程的排序顺序。对于每个检索流程,其9个相关流程与该检索流程的相关性排序顺序由一个用户调查的结果确定,将该结果作为一个基准对算法的结果进行评估。选取3个基于结构的和1个基于行为的流程相似性查询算法,对它们在准确率和效率两个方面进行了评估,实验结果展示了这些算法各自的适用场景。所提出的基准数据集和相关的算法代码已经公开发布在网上,可供研究人员下载使用。

关键词: 基准数据集, 业务流程, 相似性, Petri网

Abstract: A Benchmark dataset is presented to evaluate the performance of different process similarity search methods.This dataset is built based on the existing public IBM dataset,where the search models,their corresponding relevant models and the order of these relevant models are manually labeled by using the business domain knowledge.The relevant models are manually synthetized by adding,deleting,or combining the relevant nodes and fragments.Based on this dataset,the precision and efficiency of some process similarity search similarity methods in terms of structure and behavior are evaluated.The dataset and corresponding similarity search algorithm codes are available to the public on a website1.

Key words: benchmark dataset, business process, similarity, Petri-net

中图分类号: