Computer Integrated Manufacturing System ›› 2024, Vol. 30 ›› Issue (2): 553-568.DOI: 10.13196/j.cims.2022.0733

Previous Articles     Next Articles

Adaptive Q-learning path planning algorithm based on virtual target guidance

LI Ziyi,HU Xiangtao+,ZHANG Yongle,XU Jianjun   

  1. School of Electrical Engineering and Automation,Anhui University
  • Online:2024-02-29 Published:2024-03-07
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.52175210).

基于虚拟目标制导的自适应Q学习路径规划算法

李子怡,胡祥涛+,张勇乐,许建军   

  1. 安徽大学电气工程与自动化学院
  • 基金资助:
    国家自然科学基金资助项目(52175210)。

Abstract: When the classical reinforcement learning algorithm is used for robot path planning in unknown environments,there are problems such as low exploration efficiency,slow convergence speed,easy to fall into terrain traps,and lack of intermediate states in the learning process,resulting in blindness in exploration.To solve the above problems,a dual memory mechanism,a virtual target guidance method and an adaptive greedy factor were designed,and an adaptive Q-Learning algorithm based on Virtual Target Guidance(VTGA-Q-Learning)was proposed.To verify the performance of the new algorithm,four kinds of environment maps were designed,and the simulation experiments were compared with other improved algorithms.Furthermore,a virtual simulation experiment of the four-wheel drive McNum wheel robot was carried out to simulate the real environment and verify the performance of the algorithm.Experimental results showed that the proposed new algorithm significantly reduced the number of iterations,improved the convergence speed of reinforcement learning,and had good robustness to complex environments,which could effectively avoid terrain traps,improve the performance of mobile robot navigation system and provided a reference for mobile robot autonomous path planning.

Key words: Q-learning, path planning, reinforcement learning, mobile robots

摘要: 针对经典强化学习算法用于未知环境下机器人路径规划问题时,存在探索效率低、收敛速度慢、易陷入地形陷阱,以及学习过程缺少中间态导致探索盲目性等问题,设计了双重记忆机制、虚拟目标引导方法、自适应贪婪因子,提出基于虚拟目标引导的自适应Q学习算法。设计了4种环境地图,同其他改进算法进行了对比仿真实验,并通过四驱麦克纳姆轮机器人虚拟仿真实验验证算法性能。实验结果表明,新算法显著减少了迭代次数,提高了强化学习收敛速度,且对复杂环境具有较好的鲁棒性,能够有效避免地形陷阱,提高移动机器人导航系统性能,为移动机器人自主路径规划提供了参考。

关键词: Q学习, 路径规划, 强化学习, 移动机器人

CLC Number: