Computer Integrated Manufacturing System ›› 2025, Vol. 31 ›› Issue (6): 2059-2070.DOI: 10.13196/j.cims.2024.0042

Previous Articles     Next Articles

AGV path planning for transmission assembly line based on IRPA-PDERL algorithm

WANG Chen1,2,3,SUN Nan1+,ZOU Chunlong1,2,HUANG Yuchun1,WANG Shenghuai1,2   

  1. 1.School of Mechanical Engineering,Hubei University of Automotive Technology
    2.Shiyan Industrial Technology Research Institute of China Engineering Science and Technology
    3.Shanghai Key Laboratory of Intelligent Manufacturing and Robotics,Shanghai University
  • Online:2025-06-30 Published:2025-07-07
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.51475150),the Hubei Provincial Key R&D Program,China(No.2021BAA056),the Hubei Provincial Higher Education Institutions Young and Middle-Aged Science and Technology Innovation Team Program,China(No.T20200018),and the Hubei University of Automotive Technology Doctoral Foundation,China (No.BK201905).

基于内在奖励策略引导的自适应近端蒸馏进化强化学习算法的变速箱装配线AGV路径规划

王宸1,2,3,孙楠1+,邹春龙1,2,黄玉春1,王生怀1,2   

  1. 1.湖北汽车工业学院 机械工程学院
    2.中国工程科技十堰产业技术研究院
    3.上海大学上海市智能制造与机器人重点实验室
  • 作者简介:
    王宸(1983-),男,湖北十堰人,湖北汽车工业学院机械工程学院教授,上海大学博士,研究方向:智能制造,Email:893468804@qq.com;

    +孙楠(1999-),男,湖北十堰人,硕士研究生,研究方向:强化学习与路径规划,通讯作者,E-mail:1293373890@qq.com;

    邹春龙(1988-),男,湖北襄阳人,湖北汽车工业学院机械工程学院讲师,硕士,研究方向:AGV路径规划,E-mail:471791844@qq.com;

    黄玉春(1998-),男,江苏盐城人,硕士研究生,研究方向:机器视觉和智能制造,E-mail:427720352@qq.com;

    王生怀(1979-),男,湖北十堰人,湖北汽车工业学院机械工程学院教授,博士,研究方向:精密测量、智能制造,E-mail:115649144@qq.com。
  • 基金资助:
    国家自然科学基金资助项目(51475150);湖北省重点研发计划资助项目(2021BAA056);湖北省高等学校中青年科技创新团队计划资助项目(T20200018);湖北汽车工业学院博士基金资助项目(BK201905)。

Abstract: Aiming at the issue of low path planning efficiency for Automated Guided Vehicle (AGV) facing mixed U-shaped and dynamic obstacles,an adaptive Proximal Distillation Evolutionary Reinforcement Learning algorithm guided by Intrinsic Reward Policy (IRPA-PDERL).Initially,Random Network Distillation (RND) was introduced as intrinsic rewards in the fitness function to enhance the policy of diversity of elite policy evaluation.An adaptive intrinsic reward weight factor  was designed to balance the algorithm on the capacity to exploration and exploitation,aiding AGV in selecting the optimal strategy in highly uncertain dynamic obstacle environments.The covariance parameters of the mutation operator were optimized by covariance matrix adaption evolution strategy to decrease parameter sensitivity.Comparing with multiple algorithms on three different environments,the experimental results showed that the improved algorithm reduced paths by 12.65%,13.44% and 12.87% in U-shaped sub-assembly line,dynamic obstacle and transmission assembly line environments respectively,with faster convergence speeds than the original algorithms,and had strong robustness in complex environments.

Key words: U-shaped obstacles, evolutionary reinforcement learning, intrinsic reward, adaptive reward weights, path planning

摘要: 针对自动引导运输车(AGV)面对混合U型障碍与动态障碍时路径规划效率低的问题,提出一种基于内在奖励策略引导的自适应近端蒸馏进化强化学习算法。在适应度函数中引入随机网络蒸馏模型作为内在奖励,以提升算法评估精英策略的多样性;设计自适应内在奖励权重因子α平衡算法探索与利用的能力,帮助AGV在不确定性强的动态障碍环境中选择最优策略。将变异算子的协方差参数通过协方差矩阵自适应进化策略进行优化,从而降低参数的敏感程度。通过与多种算法分别在3种不同的环境中对比表明,改进后的算法在U型分装线、动态障碍、变速箱装配线环境中的路径分别减少12.65%,13.44%,12.87%,相比原始算法的收敛速度更快,且在复杂环境中具有较强的鲁棒性。

关键词: U型障碍, 进化强化学习, 内在奖励, 自适应奖励权重, 路径规划