Computer Integrated Manufacturing System ›› 2025, Vol. 31 ›› Issue (3): 815-827.DOI: 10.13196/j.cims.2024.0004

Previous Articles     Next Articles

Reinforcement learning strategy for robotic multiple peg-in-hole assembly considering variable rotation parameters

YAN Zhichao1,ZHOU Yong1,HU Kaixiong1,LI Weidong2+   

  1. 1.School of Transportation and Logistics Engineering,Wuhan University of Technology
    2.School of Mechanical Engineering,Shanghai University of Technology
  • Online:2025-03-31 Published:2025-04-02
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.51975444).

考虑可变旋转参数的机器人多轴孔装配强化学习策略

鄢智超1,周勇1,胡楷雄1,李卫东2+   

  1. 1.武汉理工大学交通与物流工程学院
    2.上海理工大学机械工程学院
  • 作者简介:
    鄢智超(1998-),男,湖北武汉人,硕士研究生,研究方向:机器人技术,E-mail:yanzhichao@whut.edu.cn;

    周勇(1973-),男,湖北汉川人,副教授,博士,博士生导师,研究方向:机器人技术及应用、物流装备协同作业调度与智能化、3D 打印/扫描技术等,E-mail:zhouyong@whut.edu.cn;

    胡楷雄(1985-),男,湖北武汉人,副教授,博士,硕士生导师,研究方向:智能制造、材料分析测试、材料制备与加工,E-mail:kaixiong.hu@whut.edu.cn;

    +李卫东(1969-),男,陕西西安人,教授,博士,博士生导师,研究方向:机器人应用、可持续再制造、人工智能等,通讯作者,E-mail:weidongli@usst.edu.cn。
  • 基金资助:
    国家自然科学基金资助项目(51975444)。

Abstract: To solve the problems such as low training efficiency and poor adaptability due to the heavy reliance on manual teaching data,a reinforcement learning strategy considering variable rotation parameters was proposed for robotic multiple peg-in-hole assembly.An attitude adjustment model with variable rotation parameters was presented to collect the corresponding relationship data between the contact mechanics information of multiple peg-holes and the attitude adjustment action,from which the data could be used as the pre-training learning data of assembly skills.Furthermore,an improved Deep Deterministic Policy Gradient(DDPG)reinforcement learning algorithm was proposed,which used multi-factor sparse reward function to perform appropriate reward evaluation for assembly actions to improve learning efficiency and success rate.Finally,a case study of multiple peg-in-hole electronic component assembly was carried out on the simulation and experimental platform.The results showed that the proposed strategy had good scene adaptability,and could effectively improve the learning efficiency and success rate of assembly compared with the classical reinforcement learning method,while significantly reducing the contact force/torque of assembly.

Key words: collaborative robot, multiple peg-in-hole assembly, attitude adjustment model, improved deep deterministic policy gradient algorithm

摘要: 针对目前机器人多轴孔装配学习策略严重依赖人工示教数据,导致训练效率低和场景适应性差等问题,提出一种考虑可变旋转参数的机器人多轴孔装配强化学习策略。首先,提出一种可变旋转参数的姿态调整模型,据此采集多轴孔接触力学信息与姿态调整动作的对应关系数据,以此作为装配技能的预训练学习数据。进而,提出一种改进深度确定性策略梯度(DDPG)强化学习算法,通过多因素稀疏奖励函数对装配动作进行合适的奖励评价以提高学习效率和成功率。最后,在仿真和实验平台上进行了多轴孔电子元器件装配的案例研究,结果表明,所提方法具有良好的场景适应性,相对经典强化学习方法能有效提高装配的学习效率和成功率,同时明显减小了装配接触力/力矩。

关键词: 协作机器人, 多轴孔装配, 姿态调整模型, 改进深度确定性策略梯度算法

CLC Number: