计算机集成制造系统 ›› 2013, Vol. 19 ›› Issue (08 ): 1784-1793.

• 产品创新开发技术 • 上一篇    下一篇

基于MapReduce的海量事件日志并行转化算法

窦蒙1,2,3,闻立杰1,2,3+,王建民1,2,3,闫志强1,4   

  1. 1.清华大学软件学院
    2.清华大学信息系统安全教育部重点实验室
    3.清华大学清华信息科学与技术国家实验室
    4.首都经济贸易大学信息管理与信息系统系
  • 出版日期:2013-08-31 发布日期:2013-08-31
  • 基金资助:
    国家自然科学基金资助项目(61003099);国家863计划资助项目(2012AA040904);教育部—中国移动科研基金资助项目(MCM20123011)。

Parallel algorithm to convert big event log based on MapReduce

  • Online:2013-08-31 Published:2013-08-31
  • Supported by:
    Project supported by the National Natural Science Foundation,China (No.61003099),the National High-Tech.R&D Program,China (No.2012AA040904),and the Ministry of Education and China Mobile Research Foundation,China(No.MCM20123011).

摘要: 随着大数据时代的来临,为了高性能地转化海量分布式日志,提出事件日志在云平台上基于MapReduce架构的分布式转化算法。提出基于案例拆分的改进算法,以转化单机上的日志,使其变得可行;进一步提出基于MapReduce的并行转化算法。这是在过程挖掘领域中首次实现从海量原始日志到可扩展事件流事件日志的并行转化,极大地提高了转化性能。

关键词: 大数据, 事件日志, 过程挖掘, 映射归约, 可扩展事件流, 信息系统

Abstract: With the coming of big data time,to convert the mass distributed log in high performance,a distributed conversion algorithm of event log based on MapReduce framework was proposed.An improved algorithm based on case split was put forwarded,thus the conversion of log on single machine became feasible.Furthermore,a parallel algorithm based on MapReduce was proposed.In the area of process mining,it was the first time to realize the parallel conversion from mass original log to eXtensible Events Stream (XES) event log,and the conversion performance was improved extremely.

Key words: big data, event log, process mining, MapReduce, extensible event stream, infermation system

中图分类号: