计算机集成制造系统 ›› 2023, Vol. 29 ›› Issue (5): 1590-1601.DOI: 10.13196/j.cims.2023.05.016

• • 上一篇    下一篇

MARL-GPN:一种基于多智能体强化学习的博弈Petri网

刘雨舟1,2,方贤文1,2   

  1. 1.安徽理工大学数学与大数据学院
    2.安徽省煤矿安全大数据分析与预警技术工程实验室
  • 出版日期:2023-05-31 发布日期:2023-06-14
  • 基金资助:
    国家自然科学基金资助项目(61572035,61402011);安徽省重点研究与开发计划资助项目(2022a05020005);安徽省高校领军骨干人才资助项目(2020-1-12)。

MARL-GPN:A game Petri net based on multi-agent reinforcement learning

LIU Yuzhou1,2,FANG Xianwen1,2   

  1. 1.College of Mathematics and Big Data,Anhui University of Science and Technology
    2.Anhui Province Engineering Laboratory for Big Data Analysis and Early Warning Technology of Coal Mine Safety
  • Online:2023-05-31 Published:2023-06-14
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.61572035,61402011),the Key Research and Development Program of Anhui Province,China(No.2022a05020005),and the Leading Backbone Talent Project in Anhui Province,China(No.2020-1-12).

摘要: 作为过程挖掘中的一个重要研究分支,变化挖掘可运用在系统安全分析、IPS安全建设等场景下,从变化域的生成情况判断可能存在的未知漏洞。而传统变化挖掘算法更注重业务流程的安全性,无法满足在动态博弈场景下挖掘需要考虑多方安全效益所形成有效变化域的情况。针对此问题,提出一种基于多智能体强化学习的博弈Petri网(MARL-GPN),以专用于动态安全模型的有效变化域研究。该新型的博弈Petri网会首先根据攻防事件构建效益矩阵,然后各智能体间会根据对方当前状态与下一步走势生成最优应对策略,其次在指定学习周期下达成纳什平衡得到双方最优活动迹,与传统变化挖掘进行比对即可得到有效变化域。最后以典型信息安全事件为例验证了该模型的可行性。

关键词: 变化挖掘, 强化学习, 多智能体博弈, 博弈Petri网, 网络攻防

Abstract: As an important research branch in process mining,change mining can be used in system security analysis,IPS security construction and other scenarios to judge possible unknown vulnerabilities from the generation of change regions.The traditional change mining algorithm pays more attention to the security of the business process,and cannot meet the needs of mining the effective change region formed by considering the security benefits of multiple parties in the scenario of a dynamic game such as security attack-defense.Aiming at this problem,a Game Petri Net based on Multi-Agent Reinforcement Learning (MARL-GPN) was proposed,to which the study of effective change regions of dynamic security models was dedicated.This new Game Petri net would first construct a benefit matrix according to each attack-defense event,and then the multi-agents would generate the current optimal response strategy according to the current state and the next trend of the opponent,and then reach the Nash equilibrium under the specified learning cycle to obtain the most optimal response strategy for both parties.The optimal activity trace was matched with the traditional change mining to obtain the effective change regions.The feasibility of the model was verified by taking a typical information security incident as an example.

Key words: change mining, reinforcement learning, multi-agent game, game Petri net, network attack-defense

中图分类号: