计算机集成制造系统 ›› 2021, Vol. 27 ›› Issue (8): 2341-2349.DOI: 10.13196/j.cims.2021.08.016

• 当期目次 • 上一篇    下一篇

双足机器人步态控制的深度强化学习方法

冯春,张祎伟,黄成,姜文彪,武之炜   

  1. 常州工学院航空与机械工程学院
  • 出版日期:2021-08-31 发布日期:2021-08-31
  • 基金资助:
    国家自然科学基金青年基金资助项目(11802040);2018年江苏省青蓝工程优秀青年骨干教师资助项目(A1-5501-19-003)。

Deep reinforcement learning method for biped robot gait control

  • Online:2021-08-31 Published:2021-08-31
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.11802040),and the Jiangsu Provincial Outstanding Young Key Teachers Fund of “Green Blue Project” in 2018,China(No.A1-5501-19-003).

摘要: 针对双足机器人行走过程中的步态稳定控制问题,提出一种改进深度Q网络的深度强化学习方法。首先,将深度Q网络算法与确定性策略梯度相结合,提出用修正Double-Q网络优化操作—评论网络的评论网络,给出一种改进的深度Q网络;然后,建立双足机器人连杆模型,在常规的平整路面上将改进的深度Q网络用于作为智能体的双足机器人进行步态控制训练。MATLAB仿真结果表明,与深度Q网络和深度确定性策略梯度算法相比,所提算法有更好的训练速度且其回报曲线具有良好的平滑性。在CPU训练下,经过20 h左右深度强化学习能够完成智能体训练。双足机器人在较小的力矩和长距离下能够稳定快步行走。

关键词: 双足机器人, 步态控制, 深度强化学习, 智能体, 操作—评论, 改进深度Q网络算法

Abstract: Aiming at the stable control of gait during biped robot walking,a deep reinforcement learning method with improved Deep Q-Network (DQN) was proposed.By combining DQN algorithm with a deterministic strategy gradient,an improved DQN learning network was proposed to replace the critic network of actor-critic network with a clipped Double-Q network.A link model of biped robot was established,and the proposed network was used for biped robots gait control training as agents in a conventional flat road environment.MATLAB simulation results showed that compared with DQN and Deep Deterministic Policy Gradient (DDPG) algorithms,the proposed algorithm had a better training speed and its average reward curve had a good smoothness.Under the CPU training conditions,the agent training could be completed after about 20 hours of deep reinforcement learning.The biped robot could achieve stable and fast walking (average speed about 0.5m/s) under the conditions of small torque and long distance (about 5 meters).

Key words: biped robot, gait control, deep reinforcement learning, agent, actor-critic, improved deep Q-net algorithm

中图分类号: