Computer Integrated Manufacturing System ›› 2024, Vol. 30 ›› Issue (6): 1972-1988.DOI: 10.13196/j.cims.2023.0382

Previous Articles     Next Articles

Multi-ship collaborative collision avoidance strategy based on multi-agent deep reinforcement learning

HUANG Renxian1,2,3,LUO Liang1,2,3+   

  1. 1.Key Laboratory of High Performance Ship Technology,Ministry of Education,Wuhan University of Technology
    2.School of Naval Architecture,Ocean and Energy Power Engineering,Wuhan University of Technology
    3.Sanya Science and Education Innovation Park of Wuhan University of Technology
  • Online:2024-06-30 Published:2024-07-08
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.52101368).

基于多智能体深度强化学习的多船协同避碰策略

黄仁贤1,2,3,罗亮1,2,3+   

  1. 1.武汉理工大学高性能舰船技术教育部重点实验室
    2.武汉理工大学船海与能源动力工程学院
    3.武汉理工大学三亚科教创新园
  • 作者简介:
    黄仁贤(1998-),男,福建漳州人,硕士研究生,研究方向:深度强化学习、智能船舶,E-mail:hrx751770645@163.com;

    +罗亮(1980-),男,湖北武汉人,副教授,博士,博士生导师,研究方向:系统仿真集成、舰船相关数字技术、高性能计算,通讯作者,E-mail:luoliang610@163.com。
  • 基金资助:
    国家自然科学基金资助项目(52101368)。

Abstract: To improve the coordination,safety,practicability and energy saving of intelligent collision avoidance strategy for multi-ship encounters,a Prioritized Experience Replay-Multi Agent Softmax Deep Double Deterministic Policy Gradient (PER-MASD3) algorithm was proposed by combining with the Prioritized Experience Replay mechanism under the Centralized Training with Decentralized Execution (CTDE) framework for solving the multi-ship cooperative collision avoidance problem.It not only solved the value estimation bias problem in Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm,but also introduced entropy regularization term in the process of model training to promote the exploration and control of stochastic control strategies.Adaptive noise was adopted to effectively explore tasks at different stages,further improving the learning effect and stability of the algorithm.The experiments showed that the proposed PER-MASD3 algorithm had better decision-making effect,faster convergence speed and more stable performance when it was used to solve the problem of multi-ship collaborative collision avoidance.

Key words: multi-agent deep reinforcement learning, coordinated collision avoidance, centralized training with decentralized execution, prioritized experience replay, multi-agent Softmax deep double deterministic policy gradient

摘要: 为了提高多船会遇时智能避碰策略的协同性、安全性、实用性和节能性,在中心化训练去中心化执行框架下,结合优先经验回放机制提出一种多智能体Softmax深层双确定性策略梯度PER-MASD3算法,用于解决多船协同避碰问题,该算法不仅解决了双延迟确定策略梯度(TD3)算法存在的值估计偏差问题,还在模型训练过程中引入熵正则项,以促进探索和控制随机控制策略,采用自适应噪声对不同阶段的任务进行有效探索,进一步提升了算法的学习效果和稳定性。通过实验验证,所提算法在解决多船协同避碰问题上具有较好的决策效果、更快的收敛速度和更稳定的性能。

关键词: 多智能体深度强化学习, 协同避碰, 中心化训练去中心化执行, 优先经验回放, 多智能体Softmax深层双确定性策略梯度

CLC Number: