• 论文 •    

基于FP-tree的快速挖掘全局最大频繁项集算法

何波   

  1. 重庆理工大学 计算机科学与工程学院,重庆400054
  • 出版日期:2011-07-15 发布日期:2011-07-25

Fast algorithm for mining global maximum frequent itemsets based on FP-tree

HE Bo   

  1. School of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
  • Online:2011-07-15 Published:2011-07-25

摘要: 挖掘最大频繁项集的算法多基于局部数据库,为此提出了一种基于频繁模式树的快速挖掘全局最大频繁项集算法。该算法首先挖掘出所有全局频繁项目组成集合,然后各个节点根据该集合构建局部频繁模式树,最后将该集合作为全局最大频繁候选项集,采用自顶向下策略挖掘出所有的全局最大频繁项集。与类Apriori算法相比,该算法采用的频繁模式树结构能大幅度降低数据库扫描次数和运行时间;自顶向下的策略能大幅度减少候选项集数和通信量。实验结果表明,该算法是快速和高效的。

关键词: 数据挖掘, 频繁模式树, 全局最大频繁项集, 算法

Abstract: Most mining maximum frequent itemsets algorithm based on local data base, so a fast algorithm for Mining Global Maximum Frequent Itemsets based on Frequent pattern tree(MGMFIF) was proposed. MGMFIF mined all global frequent items and made itemset, then local Frequent-Pattern tree (FP-tree) of each node was constructed based on this itemset. Finally, this itemset was chose as global maximum frequent itemsets, and all the global maximum frequent itemsets were obtained by top-down strategy. By adopting FP-tree structure, MGMFIF greatly reduced database scanning times and runtime comparing to Apriori-like algorithms. MGMFIF remarkably lessened candidate itemsets and communication traffic by using top-down strategy. Experimental results suggested that MGMFIF was fast and effective.

Key words: data mining, frequent-pattern tree, global maximum frequent itemsets, algorithms

中图分类号: