Computer Integrated Manufacturing System ›› 2025, Vol. 31 ›› Issue (2): 567-578.DOI: 10.13196/j.cims.2022.0626

Previous Articles     Next Articles

Detection method of industrial Internet malicious code based on code visualization

LONG Molan,KANG Haiyan+   

  1. Computer School,Beijing Information Science and Technology University
  • Online:2025-02-28 Published:2025-03-06
  • Supported by:
    Project supported by the National Social Science Foundation,China(No.21BTQ079),the Humanities and Social Sciences Research Foundation of the Ministry of Education,China(No.20YJAZH046).

基于代码可视化的工业互联网恶意代码检测方法

龙墨澜,康海燕+   

  1. 北京信息科技大学计算机学院
  • 作者简介:
    龙墨澜(1997-),男,河南郑州人,硕士研究生,研究方向:入侵检测与恶意代码,E-mail:longmolan9@163.com;

    +康海燕(1971-),男,河北灵寿人,教授,博士,研究方向:网络安全与隐私保护等,通讯作者,E-mail:kanghaiyan@126.com。
  • 基金资助:
    国家社会科学基金资助项目(21BTQ079);教育部人文社会科学基金资助项目(20YJAZH046)。

Abstract: In the industrial Internet,faced with the increasing number and types of malicious software,traditional malicious code detection methods have some problems,such as low accuracy,high time cost and complex data preprocessing process.Combined with the mature application of neural network in image classification,a malicious code detection method based on code visualization in industrial Internet was proposed.The original file of malicious code was transformed into color image by visualization algorithm,and the malicious code family was detected and identified by improved GoogLenet.Data augmentation was used to expand the original sample set,and a weighted focal loss function suitable for multi-classification tasks was proposed.The weight parameters of different malicious code families in the model training process were adjusted by using the expected volume of samples to alleviate the influence of model overfitting.Finally,experiments on Malimg and Leopard Mobile datasets showed that the color malicious code image was better than gray scale image of malicious code in terms of accuracy.In addition,the accuracy of the proposed method in Malimg and Leopard Mobile datasets was 98.26% and 97.19% respectively,indicating that this method could effectively detect malicious code in industrial Internet.

Key words: malicious code classification, code visualization, deep learning, data augment, weighted loss function

摘要: 针对工业互联网中不断增加的恶意软件数量和种类,传统恶意代码检测方法存在准确率低、时间开销大、数据预处理过程复杂等问题,结合神经网络在图像分类方向的成熟应用,提出一种基于代码可视化的工业互联网恶意代码检测方法。通过可视化算法将恶意代码原始文件转化为彩色图像,采用改进GoogLenet检测并识别恶意代码家族;用数据增强扩充原始样本集,并提出适用于多分类任务的带权Focal loss损失函数,通过样本期望体积调整不同恶意代码家族在模型训练过程中的权重参数,缓解模型过拟合的影响。最后在Malimg和Leopard Mobile两个数据集上的实验表明,彩色恶意代码图像在准确性方面优于恶意代码灰度图,该方法在Malimg和Leopard Mobile数据集的准确率分别达到98.26%和97.19%,验证了该方法的优越性。

关键词: 恶意代码分类, 代码可视化, 深度学习, 数据增强, 带权损失函数

CLC Number: