基于深度强化学习算法的双边装配线第一类平衡

doi:10.13196/j.cims.2021.0597

计算机集成制造系统 ›› 2024, Vol. 30 ›› Issue (2): 508-519.DOI: 10.13196/j.cims.2021.0597

基于深度强化学习算法的双边装配线第一类平衡

程玮¹,张亚辉²,曹先锋³,金增志³,胡小锋¹⁺

1.上海交通大学机械与动力工程学院
2.上海交通大学海洋装备研究院
3.中国重汽集团工艺研究院

出版日期:2024-02-29 发布日期:2024-03-06
基金资助:
国家自然科学基金资助项目(51975373);上海交通大学新进青年教师启动计划资助项目(22X010503668)。

Deep reinforcement learning algorithm for the type I two-sided assembly line balancing problem

CHENG Wei¹,ZHANG Yahui²,CAO Xianfeng³,JIN Zengzhi³,HU Xiaofeng¹⁺#br#

1.School of Mechanical Engineering,Shanghai Jiao Tong University
2.Institute of Marine Equipment,Shanghai Jiao Tong University
3.Process Research Institution,China National Heavy Duty Truck Group Co.,Ltd

Online:2024-02-29 Published:2024-03-06
Supported by:
Project supported by the National Natural Science Foundation,China(No.51975373),and the New Faculty Start-up Program of Shanghai Jiao Tong University,China(No.22X010503668).

摘要/Abstract

摘要： 针对传统优化算法求解双边装配线第一类平衡问题时不能有效利用历史求解经验,难以得到最优解,提出一种深度强化学习求解算法CNN-PPO。设计了CNN-PPO强化学习智能体结构,在近端策略优化算法基础上,引入卷积神经网络增强智能体的数据特征提取能力;根据双边装配线问题特征,定义状态矩阵对双边装配线问题进行描述,并引入标记层辅助智能体进行任务决策;根据问题优化目标设计了奖励函数,结合强化学习在线执行—评价机制,为每次决策选择最优的待分配任务,并通过多个案例测试验证了算法的有效性和稳定性。实验结果表明,所提方法的求解结果具有优越性,59个测试案例中有57个可以达到下界。

关键词: 双边装配线, 第一类平衡问题, 深度强化学习, 卷积神经网络, 近端策略优化

Abstract: The traditional optimization algorithm cannot effectively use historical solving experience and is difficult to obtain the optimal solution when solving the type I two-sided assembly line balancing problem.Aiming at this problem,a deep reinforcement learning algorithm named Proximal Policy Optimization with Convolutional Neural Networks(CNN-PPO)was proposed.The deep reinforcement learning agent structure of the CNN-PPO was designed.Based on the Proximal Policy Optimization(PPO),the Convolutional Neural Networks(CNN)was introduced to enhance the data feature extraction capabilities of the agent.According to the characteristics of two-sided assembly line balancing,a state matrix was proposed to describe the two-sided assembly line balancing problem and introduce the mask layer to assist the agent in task decision-making.A reward function was designed according to the optimization goal,the optimal combination behavior strategy was selected for each decision by combining with the reinforcement learning online execution-evaluation(Actor-Critic)mechanism,and the effectiveness and stability of the algorithm were verified through multiple example tests.The experimental results showed that the solution results of the proposed algorithm were better than the current algorithms,of which 57 could reach the lower bound among 59 test cases.

Key words: two-sided assembly line, type I balancing problem, deep reinforcement learning, convolutional neural networks, proximal policy optimization

中图分类号:

TP18

程玮, 张亚辉, 曹先锋, 金增志, 胡小锋. 基于深度强化学习算法的双边装配线第一类平衡[J]. 计算机集成制造系统, 2024, 30(2): 508-519.

CHENG Wei, ZHANG Yahui, CAO Xianfeng, JIN Zengzhi, HU Xiaofeng. Deep reinforcement learning algorithm for the type I two-sided assembly line balancing problem[J]. Computer Integrated Manufacturing System, 2024, 30(2): 508-519.

[1]	张淦, 袁堂晓, 汪惠芬, 柳林燕. 基于BERT和TextCNN的智能制造成熟度评估方法[J]. 计算机集成制造系统, 2024, 30(3): 852-863.
[2]	黄岩松, 姚锡凡, 景轩, 胡晓阳. 基于深度Q网络的多起点多终点AGV路径规划[J]. 计算机集成制造系统, 2023, 29(8): 2550-2562.
[3]	张龙, 刘杨远, 唐晓红, 张号, 肖乾, 赵丽娟. 基于概率切片累积特征的轴承双向传感器信息融合故障诊断[J]. 计算机集成制造系统, 2023, 29(8): 2722-2732.
[4]	茅健, 郭玉荣, 赵嫚. 基于注意力机制的滚动轴承故障诊断方法[J]. 计算机集成制造系统, 2023, 29(7): 2233-2244.
[5]	马冯超, 陈思溢, 刘锦. 基于云联盟协同机制的利益优化方法[J]. 计算机集成制造系统, 2023, 29(7): 2385-2396.
[6]	孟麒, 胡天亮, 马嵩华. 云—雾—边缘协同的数字孪生制造系统仿真过程动态扰动响应方法[J]. 计算机集成制造系统, 2023, 29(6): 1996-2005.
[7]	李国燕, 薛翔, 刘毅, 潘玉恒. 改进TD3的SDN车联网边缘计算卸载策略[J]. 计算机集成制造系统, 2023, 29(5): 1627-1634.
[8]	熊志华, 陈昊, 王长生, 岳明, 侯文彬, 徐斌. 基于深度强化学习的人机协作组装任务分配[J]. 计算机集成制造系统, 2023, 29(3): 789-800.
[9]	黄沈权, 王凤虎, 潘拓辰, 周宏明, 龙安. 融合词频特征的转动副间隙热成像监测模型[J]. 计算机集成制造系统, 2023, 29(12): 3964-3973.
[10]	李铁军, 马仁龙, 刘今越, 贾晓辉. 面向人机协作的CNN手部抓握意图识别[J]. 计算机集成制造系统, 2023, 29(12): 4021-4031.
[11]	杨世强, 李卓, 王金华, 贺朵, 李琦, 李德信. 基于新分区策略的ST-GCN人体动作识别[J]. 计算机集成制造系统, 2023, 29(12): 4040-4050.
[12]	刘孝保, 张嘉祥, 阴艳超, 刘佳. 主从特征融合驱动的铝型材表面缺陷检测[J]. 计算机集成制造系统, 2023, 29(1): 192-199.
[13]	蔡泽, 胡耀光, 闻敬谦, 张立祥. 复杂动态环境下基于深度强化学习的AGV避障方法[J]. 计算机集成制造系统, 2023, 29(1): 236-245.
[14]	宫文峰, 陈辉, WANG Danwei. 基于深度学习的船舶机械微小故障快速诊断方法[J]. 计算机集成制造系统, 2022, 28(9): 2852-2864.
[15]	刘丽, 裴行智, 雷雪梅. 基于时间卷积注意力网络的剩余寿命预测方法[J]. 计算机集成制造系统, 2022, 28(8): 2375-2386.

基于深度强化学习算法的双边装配线第一类平衡

Deep reinforcement learning algorithm for the type I two-sided assembly line balancing problem

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics