Computer Integrated Manufacturing System ›› 2023, Vol. 29 ›› Issue (1): 236-245.DOI: 10.13196/j.cims.2023.01.020

Previous Articles     Next Articles

Collision avoidance for AGV based on deep reinforcement learning in complex dynamic environment

CAI Ze,HU Yaoguang+,WEN Jingqian,ZHANG Lixiang#br#   

  1. Laboratory of Industrial and Intelligent System Engineering,Beijing Institute of Technology
  • Online:2023-01-31 Published:2023-02-15
  • Supported by:
    Project supported by the National Key Research and Development Program,China(No.2021YFB1715700),and the National Natural Science Foundation,China(No.52175451).

复杂动态环境下基于深度强化学习的AGV避障方法

蔡泽,胡耀光+,闻敬谦,张立祥   

  1. 北京理工大学工业与智能系统工程研究所
  • 基金资助:
    国家重点研发计划资助项目(2021YFB1715700);国家自然科学基金资助项目(52175451)。

Abstract: To improve the collision avoidance capability of Automated Guided Vehicles (AGV) in the complex dynamic environment of smart factories,enable them to carry out material handling tasks more safely and efficiently following the global path,a local collision avoidance method based on deep reinforcement learning was proposed.The problem of collision avoidance of AGV was formulated as Partial Observational Markov Decision Process (POMDP) in which observation space,action space and reward function were expatiated.Tracking of the global path was achieved by setting different reward values.Then a Deep Deterministic Policy Gradient (DDPG) method was further implemented to solve collision avoidance policy.The trained policy was validated in various simulated scenarios,and the effectiveness was proved.The experimental results showed the proposed approach could respond to the complex dynamic environment and reduce the time and distance of collision avoidance.

Key words: dynamic collision avoidance, deep reinforcement learning, tracking of global path, smart factory

摘要: 为提升自动导引车(AGV)在智能工厂复杂动态环境下的避障能力,使其能在全局路径引导下安全、高效地完成避障任务,提出一种基于深度强化学习的局部避障方法。首先,将避障问题表示为部分观测马尔可夫决策过程,详细描述了观测空间、动作空间、奖励函数和最优避障策略,通过设置不同的奖励实现以全局路径引导局部避障规划;然后,在此基础上,采用深度确定性策略梯度算法训练避障策略;最后,建立了仿真实验环境,并设计多种实验场景来验证所提方法的有效性。实验结果表明,所提方法可以应对复杂动态环境,减小避障时间与距离,提高运行效率。

关键词: 动态避障, 深度强化学习, 全局路径引导, 智能工厂

CLC Number: