Adaptive Q-learning path planning algorithm based on virtual target guidance

doi:10.13196/j.cims.2022.0733

Computer Integrated Manufacturing System ›› 2024, Vol. 30 ›› Issue (2): 553-568.DOI: 10.13196/j.cims.2022.0733

Previous Articles Next Articles

Adaptive Q-learning path planning algorithm based on virtual target guidance

LI Ziyi,HU Xiangtao⁺,ZHANG Yongle,XU Jianjun

School of Electrical Engineering and Automation,Anhui University

Online:2024-02-29 Published:2024-03-07
Supported by:
Project supported by the National Natural Science Foundation,China(No.52175210).

基于虚拟目标制导的自适应Q学习路径规划算法

李子怡,胡祥涛⁺,张勇乐,许建军

安徽大学电气工程与自动化学院

基金资助:
国家自然科学基金资助项目(52175210)。

Abstract

Abstract: When the classical reinforcement learning algorithm is used for robot path planning in unknown environments,there are problems such as low exploration efficiency,slow convergence speed,easy to fall into terrain traps,and lack of intermediate states in the learning process,resulting in blindness in exploration.To solve the above problems,a dual memory mechanism,a virtual target guidance method and an adaptive greedy factor were designed,and an adaptive Q-Learning algorithm based on Virtual Target Guidance(VTGA-Q-Learning)was proposed.To verify the performance of the new algorithm,four kinds of environment maps were designed,and the simulation experiments were compared with other improved algorithms.Furthermore,a virtual simulation experiment of the four-wheel drive McNum wheel robot was carried out to simulate the real environment and verify the performance of the algorithm.Experimental results showed that the proposed new algorithm significantly reduced the number of iterations,improved the convergence speed of reinforcement learning,and had good robustness to complex environments,which could effectively avoid terrain traps,improve the performance of mobile robot navigation system and provided a reference for mobile robot autonomous path planning.

Key words: Q-learning, path planning, reinforcement learning, mobile robots

摘要： 针对经典强化学习算法用于未知环境下机器人路径规划问题时,存在探索效率低、收敛速度慢、易陷入地形陷阱,以及学习过程缺少中间态导致探索盲目性等问题,设计了双重记忆机制、虚拟目标引导方法、自适应贪婪因子,提出基于虚拟目标引导的自适应Q学习算法。设计了4种环境地图,同其他改进算法进行了对比仿真实验,并通过四驱麦克纳姆轮机器人虚拟仿真实验验证算法性能。实验结果表明,新算法显著减少了迭代次数,提高了强化学习收敛速度,且对复杂环境具有较好的鲁棒性,能够有效避免地形陷阱,提高移动机器人导航系统性能,为移动机器人自主路径规划提供了参考。

关键词: Q学习, 路径规划, 强化学习, 移动机器人

CLC Number:

TP242

LI Ziyi, HU Xiangtao, ZHANG Yongle, XU Jianjun. Adaptive Q-learning path planning algorithm based on virtual target guidance[J]. Computer Integrated Manufacturing System, 2024, 30(2): 553-568.

李子怡, 胡祥涛, 张勇乐, 许建军. 基于虚拟目标制导的自适应Q学习路径规划算法[J]. 计算机集成制造系统, 2024, 30(2): 553-568.

[1]	SHANG Deyong, WANG Junjie, FAN Hu, SUO Shuangfu. Obstacle avoidance path planning for manipulator based on RRT*-DR algorithm [J]. Computer Integrated Manufacturing System, 2024, 30(3): 1149-1160.
[2]	CHENG Wei, ZHANG Yahui, CAO Xianfeng, JIN Zengzhi, HU Xiaofeng. Deep reinforcement learning algorithm for the type I two-sided assembly line balancing problem [J]. Computer Integrated Manufacturing System, 2024, 30(2): 508-519.
[3]	SUN Hui, YUAN Wei. Multi-AGV motion planning based on deep reinforcement learning [J]. Computer Integrated Manufacturing System, 2024, 30(2): 708-716.
[4]	LIU Yuming, HUANG Haisong, FAN Qingsong, ZHU Yunwei, CHEN Xingran, HAN Zhenggong. Mobile robot path planning based on improved A*-DWA algorithm [J]. Computer Integrated Manufacturing System, 2024, 30(1): 158-171.
[5]	ZHANG Kaixiang, MAO Jianlin, XUAN Zhiwei, XIANG Fenghong, FU Lixia. Hierarchical scheduling based multi-robot path planning for pass terrain [J]. Computer Integrated Manufacturing System, 2024, 30(1): 172-183.
[6]	ZOU Wen, HAN Bingchen, LI Pengfei, TIAN Jianfeng. Path planning by integrating improved A* algorithm and optimized dynamic window approach [J]. Computer Integrated Manufacturing System, 2024, 30(1): 184-195.
[7]	NIE Zhenbang, YU Haibin. Reference trajectory based collision avoidance decision and trajectory tracking method for mobile robot [J]. Computer Integrated Manufacturing System, 2023, 29(9): 2879-2889.
[8]	YANG Liwei, FU Lixia, GUO Ning, YANG Zhen, GUO Hanqing, XU Xingyi. Path planning with multi-factor improved ant colony algorithm [J]. Computer Integrated Manufacturing System, 2023, 29(8): 2537-2549.
[9]	HUANG Yansong, YAO Xifan, JING Xuan, HU Xiaoyang. DQN-based AGV path planning for situations with multi-starts and multi-targets [J]. Computer Integrated Manufacturing System, 2023, 29(8): 2550-2562.
[10]	CHEN Yi, YU Jiyan. Full coverage 3D path planning algorithm for UAV in complex environment [J]. Computer Integrated Manufacturing System, 2023, 29(8): 2563-2573.
[11]	ZHAI Zhibo, JIA Guoping, WANG Tao, ZHOU Pengpeng, YAN Rushan, DAI Yusen. Teaching and learning optimization algorithm based on Laplace distribution and Balwin learning effect and its application [J]. Computer Integrated Manufacturing System, 2023, 29(8): 2611-2621.
[12]	MA Fengchao, CHEN Siyi, LIU Jin. Benefit optimization method based on cloud federation collaboration mechanism [J]. Computer Integrated Manufacturing System, 2023, 29(7): 2385-2396.
[13]	XIAO Zheng, CHENG Shupei, ZHENG Dongwei, YAN Junwei, LOU Ping, WANG Xinggang. AGV path planning method for workshop driven by digital twin [J]. Computer Integrated Manufacturing System, 2023, 29(6): 1905-1915.
[14]	NIE Zhenbang, YU Haibin, ZENG Peng. Incremental candidate path set generation and trajectory planning method for mobile robots in dynamic environments [J]. Computer Integrated Manufacturing System, 2023, 29(5): 1506-1516.
[15]	ZHANG Fan, WEI Peixiang, ZHAO Yuanyuan, TU Yiwen, TAN Yuegang. Voxel-based path planning method for continuous carbon fiber 3D printing with robot arm [J]. Computer Integrated Manufacturing System, 2023, 29(5): 1517-1527.

Adaptive Q-learning path planning algorithm based on virtual target guidance

基于虚拟目标制导的自适应Q学习路径规划算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics