Computer Integrated Manufacturing System ›› 2024, Vol. 30 ›› Issue (9): 3330-3340.DOI: 10.13196/j.cims.2022.0023

Previous Articles     Next Articles

Movement strategy of multiple robots under grid environment constraints

LI Shuo1,2,ZHAO Yongting1,3,HE Pan1,3,GAO Peng1,3,WANG Xiaojun1,3,ZHAO Lijun4,ZHENG Bin1,5+   

  1. 1.Chongqing Institute of Green and Intelligent Technology,Chinese Academy of Sciences
    2.College of Computer Science and Technology,Chongqing University of Posts and Telecommunications
    3.Vthink Intelligent Technology (Suzhou) Co.,Ltd.
    4.School of Intelligent Manufacturing Engineering,Chongqing University of Arts and Science
    5.Chongqing Municipal Key Laboratory of Artificial Intelligence and Service Robot Control Technology
  • Online:2024-09-30 Published:2024-10-09
  • Supported by:
    Project supported by the Natural Science Foundation of Chongqing Municipality,China(No.cstc2019jcyj-msxmX0442),the Foundation for Key Technology Innovation and Application Demonstration Projects in Chongqing Municipality,China(No.cstc2018jszx-cyzdX0068),and the Key Project of Technology Innovation and Application Development in Chongqing Municipality,China(No.cstc2021jscx-gksbX0003,cstc2021jscx-gksbX0020).

多机器人在网格环境约束下的运动策略

李硕1,2,赵永廷1,3,何盼1,3,高鹏1,3,王小军1,3,赵立军4,郑彬1,5+   

  1. 1.中国科学院重庆绿色智能技术研究院
    2.重庆邮电大学计算机科学与技术学院
    3.中科万勋智能科技(苏州)有限公司
    4.重庆文理学院智能制造工程学院
    5.人工智能与服务机器人控制技术重庆市重点实验室
  • 作者简介:
    李硕(1997-),男,河北石家庄人,硕士研究生,研究方向:强化学习、智能避障等,E-mail:271192784@qq.com;

    赵永廷(1986-),男,重庆人,助理研究员,硕士,研究方向:自动化技术等,E-mail:zhaoyongting@cigit.ac.cn;

    何盼(1984-),女,重庆人,副研究员,博士,研究方向:可信计算、软件可靠性等,E-mail:hepan@cigit.ac.cn;

    高鹏(1986-),男,山东莱阳人,工程师,硕士,研究方向:自动化技术、机器人智能制造等,E-mail:gpeng@cigit.ac.cn;

    王小军(1985-),男,四川绵阳人,工程师,硕士,研究方向:机器人制造等,E-mail:wangxiaojun@cigit.ac.cn;

    赵立军(1980-),男,重庆人,教授,博士,研究方向:自动化技术、轻工业手工业等,E-mail:20190005@cqwu.edu.cn;

    +郑彬(1972-),男,四川威远人,研究员,博士,研究方向:机器人、优化算法等,通讯作者,E-mail:zhengbin@cigit.ac.cn。
  • 基金资助:
    重庆市自然科学基金面上资助项目(cstc2019jcyj-msxmX0442);重庆市技术创新与应用示范专项重点示范资助项目(cstc2018jszx-cyzdX0068);重庆市技术创新与应用发展专项重点资助项目(cstc2021jscx-gksbX0003,cstc2021jscx-gksbX0020)。

Abstract: Aiming at the problem of multi-agent pathfinding and obstacle avoidance planning in grid environment,a distributed and deep reinforcement learning-based multi-robot obstacle avoidance navigation method was proposed.Based on training the Proximal Policy Optimization (PPO) algorithm used for the improved method under discrete decision-making,a policy model was obtained,which generated actions that conformed to the preset specifications through multi-frame lidar distance information of each agent.It could realize the pathfinding and obstacle avoidance of the multi-robot system in different environments.By introducing density reward,distance reward and step size penalty in the training process,the model improved the ability of the agent to avoid obstacles and find paths in the scene,lightened the occurrence of congestion,deadlock and other problems,and reduced the generation of invalid paths.In the experiment part,the model was tested in random scenes,complex interaction scenes,and obstacle scenes in the simulation environment,and it was proved that the model greatly reduced the planning time and improved the generalization and stability compared with the centralized planning method.Compared with other distributed methods,the proposed density and distance reward settings had a good effect on the agent to complete the task safely and quickly,and reduced the gap with the centralized planning method in the planning effect.

Key words: multi-robot system, deep reinforcement leaning, grid workspace, path finding and obstacle avoidance

摘要: 针对多智能体在网格环境下的寻路与避障规划问题,提出一种分布式、基于深度强化学习的多机器人避障导航方法。该方法基于最近策略优化算法(PPO)用于离散决策下的改进方法进行训练得到的策略模型,该模型通过每个智能体自身的前序多帧仿真激光雷达距离信息,生成符合预设规范的动作,实现多机器人系统在不同环境中的寻路避障。该模型在训练过程中通过引入密度奖励、距离奖励以及步长惩罚,提高了智能体在场景当中的避障寻路能力,减轻了拥塞、死锁等问题的发生,减少了无效路径生成。实验部分在仿真环境中对模型在随机场景、复杂交互场景、障碍场景多个场景进行实验,证明了该模型相比于集中式规划方法大大降低了规划时间,提高了泛化性和稳定性。通过与其他分布式方法相比,证明了所提到的密度、距离奖励设置对智能体安全快速完成任务具有良好作用,在规划效果上减小了与集中式规划方式的差距。

关键词: 多智能体, 深度强化学习, 网格工作空间, 寻路避撞

CLC Number: