Computer Integrated Manufacturing System ›› 2025, Vol. 31 ›› Issue (3): 955-964.DOI: 10.13196/j.cims.2023.0552

Previous Articles     Next Articles

AGV path planning and task scheduling based on improved proximal policy optimization algorithm

QI Xuan1,ZHOU Tong2,WANG Cunsong2,PENG Xiaotian1,PENG Hao1+   

  1. 1.School of Mechanical and Power Engineering,Nanjing Tech University
    2.Institute of Intelligent Manufacturing,Nanjing Tech University
  • Online:2025-03-31 Published:2025-04-02
  • Supported by:
    Project supported by the National Key R&D Program,China(No.2021YFB3301300),the National Natural Science Foundation,China(No.62203213),and the Jiangsu Provincial Funding Program for Excellent Postdoctoral Talent,China(No.2023ZB756).

基于改进近端策略优化算法的AGV路径规划与任务调度

祁璇1,周通2,王村松2,彭孝天1,彭浩1+   

  1. 1.南京工业大学机械与动力工程学院
    2.南京工业大学智能制造研究院
  • 作者简介:
    祁璇(1999-),男,江苏无锡人,硕士研究生,研究方向:AGV路径规划与任务调度,E-mail:465279793@qq.com;

    周通(1991-),男,江苏徐州人,讲师,博士,研究方向:智能制造系统、生产调度等,E-mail:t.zh@njtech.edu.cn;

    王村松(1990-),男,黑龙江伊春人,讲师,博士,硕士生导师,研究方向:智能制造、故障预测与健康管理、路径规划等,E-mail:wangcunsong@njtech.edu.cn;

    彭孝天(1994-),男,安徽宿州人,博士,研究方向:智能产线,E-mail:pengxiaotian@njtech.edu.cn;

    +彭浩(1981-),男,湖南长沙人,教授,博士,博士生导师,研究方向:装备先进制造、工业节能、储能、高效传热传质等,通讯作者,E-mail:phsight1@hotmail.com。
  • 基金资助:
    国家重点研发计划资助项目(2021YFB3301300);国家自然科学基金资助项目(62203213);江苏省优秀博士后人才资助项目“卓博计划”(2023ZB756)。

Abstract: Automated Guided Vehicle(AGV)is a type of automated material handling equipment with high flexibility and adaptability.The current research on optimal path and scheduling algorithms for AGVs still faces problems such as poor generalization,low convergence efficiency,and long routing time.Therefore,an improved Proximal Policy Optimization(PPO)algorithm was proposed.By adapting a multi-step action selection strategy to increase the step length of AGV movement,the AGV action set was expanded from the original 4 directions by 8 directions for optimizing the optimal path.The dynamic reward function was improved to adjust the reward value in real time based on the current state of AGV for enhancing its learning ability.Then,the reward value curves were compared based on different improvement methods to validate the convergence efficiency of the algorithm and the distance of the optimal path.Finally,by employing a continuous task scheduling optimization algorithm,a novel single AGV continuous task scheduling optimization algorithm had been developed to enhance transportation efficiency.The results showed that the improved algorithm shortened the optimal path by 28.6% and demonstrated a 78.5% increase in convergence efficiency compared to the PPO algorithm.It outperformed in handling more complex tasks that require high-level policies and exhibits stronger generalization capabilities.Compared to Q-Learning,Deep Q-Network(DQN)algorithm and Soft Actor Critical(SAC)algorithm,the improved algorithm showed efficiency improvements of 84.4%,83.7%,and 77.9% respectively.After the optimization of continuous task scheduling for a single AGV,the average path was reduced by 47.6%.

Key words: automated guided vehicle, path planning, task scheduling, proximal policy optimization algorithm, reinforcement learning

摘要: 自动引导车(AGV)是一种具有高度柔性和灵活性的自动化物料运输设备,可实现路径规划、任务调度和智能分配等功能。目前关于AGV最优路径与调度算法研究仍存在泛化性差、收敛效率低、寻路时间长等问题。因此,提出一种改进近端策略优化算法(PPO)。首先,采用多步长动作选择策略增加AGV移动步长,将AGV动作集由原来的4个方向基础上增加了8个方向,优化最优路径;其次,改进动态奖励值函数,根据AGV当前状态实时调整奖励值大小,提高其学习能力;然后,基于不同改进方法比较其奖励值曲线图,验证算法收敛效率与最优路径距离;最后,采用多任务调度优化算法,设计了一种单AGV多任务调度优化算法,提高运输效率。结果表明:改进后的算法最优路径缩短了28.6%,改进后的算法相比于PPO算法收敛效率提升了78.5%,在处理更为复杂、需要高水平策略的任务时表现更佳,具有更强的泛化能力;将改进后的算法与Q学习、深度Q学习(DQN)算法、软演员-评论家(SAC)算法进行比较,算法效率分别提升了84.4%、83.7%、77.9%;单AGV多任务调度优化后,平均路径缩短了47.6%。

关键词: 自动导引小车, 路径规划, 任务调度, 近端策略优化算法, 强化学习

CLC Number: