计算机集成制造系统 ›› 2019, Vol. 25 ›› Issue (第9): 2314-2323.DOI: 10.13196/j.cims.2019.09.018

• 当期目次 • 上一篇    下一篇

基于聚类离群因子和相互密度的离群点检测算法

张忠平1,2,邱敬仰1,刘丛1,朱梦凡1,章德斌3   

  1. 1.燕山大学信息科学与工程学院
    2.燕山大学河北省计算机虚拟技术与系统集成重点实验室
    3.河北省教育考试院
  • 出版日期:2019-09-30 发布日期:2019-09-30

Outlier detection based on cluster outlier factor and mutual density

  • Online:2019-09-30 Published:2019-09-30

摘要: 针对大多基于聚类的离群点检测算法往往需要人工输入参数,对于不同的数据集很难选择一个合适参数的问题,将无参数的基于自然邻居的离群点检测算法的自然邻居搜索算法和密度峰值聚类算法相结合,提出一种基于聚类离群因子和相互密度的离群点检测算法。该算法使用相互密度和γ密度构造决策图,将γ密度异常大的样本点作为聚类中心进行聚类,最后根据聚类的离群因子找出离群聚类边界检测离群点,该算法不需要人工输入参数。在模拟数据集和真实数据集下进行了实验,证明了所提算法能很好地进行聚类和离群数据的挖掘。

关键词: 离群点, 数据挖掘, 聚类离群因子, 相互密度, &gamma, 密度

Abstract: Most outlier detection algorithms based on clustering often need to input parameters artificially,which was difficult to select a suitable parameter for different datasets.To solve this problem,an outlier detection algorithm based on cluster outlier factor and mutual density was proposed by combining the natural neighbor search algorithm of NOF algorithm with DPC algorithm.The mutual density and γ density was used to construct decision graph,and the data points with gamma-density anomalously large in decision graph were treated as cluster centers.According to the Cluster Outlier Factor(COF),the boundary of outlier cluster was detected to find the parameter automatically.The experiments showed that the proposed method could achieve good performance in clustering and outlier detection.

Key words: outlier, data mining, cluster outlier factor, mutual density, &gamma, density

中图分类号: