Computer Integrated Manufacturing System ›› 2022, Vol. 28 ›› Issue (12): 3869-3878.DOI: 10.13196/j.cims.2022.12.014

Previous Articles     Next Articles

Outlier detection algorithm based on fluctuation of centroid projection

ZHANG Zhongping1,2,3,ZHANG Yuting1,LIU Weixiong1,DENG Yu1   

  1. 1.College of Information Science and Engineering,Yanshan University
    2.The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province,Yanshan University
    3.The Key Laboratory of Software Engineering of Hebei Province
  • Online:2022-12-31 Published:2023-01-12
  • Supported by:
    Project supported by the Hebei Provincial Innovation Capability Improvement Plan,China(No.20557640D).

基于质心投影波动的离群点检测算法

张忠平1,2,3,张玉停1,刘伟雄1,邓禹1   

  1. 1.燕山大学信息科学与工程学院
    2.河北省计算机虚拟技术与系统集成重点实验室
    3.河北省软件工程重点实验室
  • 基金资助:
    河北省创新能力提升计划资助项目(20557640D)。

Abstract: Outlier detection is an important field of data mining research.In the traditional outlier detection method based on nearest neighbor,the k-nearest neighbor relationship is widely used.However,with the diversification of data distribution and the increase of data dimensions,the process of detecting outliers based on the k-nearest neighbor relationship algorithm is easily affected by different clusters and the detection effect is not satisfactory.To solve the above problems,a new neighborhood set was generated by introducing the nearest neighbor tree instead of the k-nearest neighbor relationship,and the concept of centroid projection was proposed to describe the distribution characteristics of the data object and its neighbors.As the neighbor points of the data object gradually increase,the centroid projections of outliers and internal points were different,and the centroid projection fluctuation was proposed to measure the degree of outlier of each data object.An outlier detection algorithm based on the fluctuation of centroid projection was proposed.Experiments on artificial data sets and real data sets showed that the proposed algorithm could effectively and comprehensively detect outliers.

Key words: data mining, outlier detection, k-nearest neighbors, neighbor tree, centroid projection fluctuation

摘要: 离群点检测是数据挖掘研究的一个重要领域。在传统基于近邻的离群点检测方法中,k近邻关系被广泛使用。然而,随着数据分布的多样化和数据维度的增加,基于k近邻关系算法检测离群点的过程中易受不同类簇影响而检测效果不佳。针对以上问题,首先通过引入近邻树代替k近邻关系生成新的邻域集合,提出质心投影的概念用来刻画数据对象与其邻居点的分布特征,其次在数据对象邻居点逐渐增多的过程中,离群点和内部点质心投影变化不同,采用质心投影波动来衡量每个数据对象的离群程度,最终提出了基于质心投影波动的离群点检测算法。通过在人工数据集和真实数据集下进行的实验表明,该算法能有效且较为全面地检测离群点。

关键词: 数据挖掘, 离群点检测, k近邻, 近邻树, 质心投影波动

CLC Number: