Computer Integrated Manufacturing System ›› 2024, Vol. 30 ›› Issue (6): 2130-2138.DOI: 10.13196/j.cims.2021.0791

Previous Articles     Next Articles

Community overlap discovery algorithm based on industrial big data

KANG Haiyan,JING Wu,ZHANG Yangsen   

  1. School of Information Management,Beijing Information Science and Technology University
  • Online:2024-06-30 Published:2024-07-09
  • Supported by:
    Project supported by the National Social Science Foundation,China (No.21BTQ079),and the Humanity and Social Science of Ministry of Education,China(No.20YJAZH046).

基于工业大数据的重叠社区发现算法

康海燕,景悟,张仰森   

  1. 北京信息科技大学信息管理学院
  • 作者简介:
    康海燕(1971-),男,河北灵寿人,教授,博士,研究方向:网络安全与隐私保护等,E-mail:kanghaiyan@126.com;

    景悟(1996-),男,山西太原人,硕士研究生,研究方向:信息传播与信息安全,E-mail:jingwu@iie.ac.cn;

    张仰森(1962-),男,山西运城人,教授,博士,研究方向:自然语言处理、人工智能等,E-mail:zys@bistu.edu.cn。
  • 基金资助:
    国家社科基金年度资助项目(21BTQ079);教育部人文社会科学基金资助项目(20YJAZH046)。

Abstract: Industrial big data has a large scale,complex structure,and high value density.To deeply explore and analyze its hidden relationships,trends and patterns,and to provide better decision-making basis for enterprises,combined with the idea of random walk and label propagation,a community overlap discovery algorithm based on industrial big data was proposed.The algorithm of seed node selection was designed,the importance of each node was calculated by random walk,and the irrelevant and important seed nodes were selected.Then,an overlapping community discovery algorithm was proposed,the seed node was given a unique label,and the label was propagated iteratively until the node label was no longer changed.The final overlapping community division result was obtained according to the node label.Finally,comparative experiments were carried out on real data sets and artificial data sets,the results showed that the algorithm could effectively find high-quality overlapping communities on the network.The algorithm could be applied to data analysis and information mining of industrial big data.

Key words: industrial big data, community detection, overlapping community, random walk, label propagation

摘要: 为了深入挖掘和分析工业大数据隐藏的关系、趋势和模式,从而为企业提供更好的决策依据,结合随机游走和标签传播思想,提出一种基于工业大数据的重叠社区发现算法。设计了种子节点选取算法,通过随机游走计算各节点的重要性,选出不相关和重要性高的种子节点;提出重叠社区发现算法,对种子节点赋予唯一标签,迭代进行标签传播直到节点标签不再改变,根据节点标签得到最终的重叠社区划分结果。通过在真实数据集和人工数据集上进行对比实验表明,该算法可以在网络上有效发现高质量的重叠社区,并进一步解决工业大数据的数据分析、信息挖掘等核心问题。

关键词: 工业大数据, 社区发现, 重叠社区, 随机游走, 标签传播

CLC Number: