结合逆向强化学习与强化学习的晶圆批处理设备调度方法

doi:10.13196/j.cims.2022.0646

计算机集成制造系统 ›› 2023, Vol. 29 ›› Issue (11): 3738-3749.DOI: 10.13196/j.cims.2022.0646

结合逆向强化学习与强化学习的晶圆批处理设备调度方法

王卓君¹,张朋²⁺,张洁²

1.东华大学机械工程学院
2.东华大学人工智能研究院

出版日期:2023-11-30 发布日期:2023-12-04
基金资助:
国家自然科学基金资助项目(52005099);中央高校基本科研业务费专项资金资助项目（223202100044）。

Wafer batch device scheduling method combining reverse reinforcement learning and reinforcement learning

WANG Zhuojun¹,ZHANG Peng²⁺,ZHANG Jie²

1.School of Mechanical Engineering,Donghua University
2.Artificial Intelligence Research Institute,Donghua University

Online:2023-11-30 Published:2023-12-04
Supported by:
Project supported by the National Natural Science Foundation,China（No.52005099）,and the Special Fund for Basic Scientific Research of Central Universities,China(No.223202100044)。

摘要/Abstract

摘要： 针对晶圆批处理设备调度问题,以最小化生产周期为优化目标,考虑晶圆动态到达、重入加工与不兼容性约束等特点,提出了结合逆向强化学习与强化学习（combine Inverse Reinforcement Learning and Reinforcement Learning,IRL-RL）的晶圆批处理设备调度优化方法。根据批处理设备的加工特性,将问题分解为组批和批次指派两个子问题;由于子问题内部复杂的关联特性使晶圆批处理设备调度内部机理不明,且全局奖励函数设计困难,引入逆向强化学习指导奖励函数的设计;针对晶圆lot的重入加工特性,设计期望流动时间与剩余等待时间关键状态变量;批次指派智能体兼顾考虑任务的紧急程度与工艺类型切换带来的差异生产准备时间进行综合决策,满足批处理设备工艺类型的不兼容性约束;通过设计奖励函数关键参数的非线性特征,解释晶圆lot剩余加工层数与期望流动时间之间的复杂流变关系。24 组标准算例的实验数据表明,IRL-RL算法的优化结果与计算效率优于一般强化学习算法和较优规则等方法;经企业实例数据验证,晶圆的生产周期缩短了15%。

关键词: 晶圆批处理调度, 并行批处理机, 动态调度, 逆向强化学习, 强化学习, 生产周期, 重入加工

Abstract: Aiming at the scheduling problem of wafer batch processing equipment,to minimize the production cycle as the optimization goal,considering the characteristics of dynamic wafer arrival,reentry processing and incompatibility constraints,a scheduling optimization method of wafer batch processing equipment combining Inverse Reinforcement Learning and Reinforcement Learning (IRL-RL) was proposed.According to the processing characteristics of batch processing equipment,the problem was decomposed into two sub-problems:group batch and batch assignment.Due to the complex internal correlation characteristics of sub-problems,the internal mechanism of wafer batch processing equipment scheduling was unclear,and the design of global reward function was difficult,reverse reinforcement learning is introduced to guide the design of reward function.According to the reentry processing characteristics of wafer lot,the key state variables of expected flow time and remaining waiting time were designed.The urgency of the task and the difference in production preparation time caused by process type switching was taken into account of the batch assignment agent to make comprehensive decisions meet the incompatibility constraint of batch equipment process type.By designing the nonlinear characteristics of the key parameters of the reward function,the complex rheological relationship between the number of remaining machining layers and the expected flow time of wafer lot was explained.The experimental data of 24 sets of standard examples showed that the optimization results and computational efficiency of the IRL-RL algorithm were better than those of the general reinforcement learning algorithm and better rules.According to the data of enterprise instance,the wafer production cycle was shortened by 15%.

Key words: wafer batch scheduling, parallel batch processor, dynamic scheduling, inverse reinforcement learning, reinforcement learning, production cycle, reentrant processing

中图分类号:

TH166

王卓君, 张朋, 张洁. 结合逆向强化学习与强化学习的晶圆批处理设备调度方法[J]. 计算机集成制造系统, 2023, 29(11): 3738-3749.

WANG Zhuojun, ZHANG Peng, ZHANG Jie. Wafer batch device scheduling method combining reverse reinforcement learning and reinforcement learning[J]. Computer Integrated Manufacturing System, 2023, 29(11): 3738-3749.

[1]	黄岩松, 姚锡凡, 景轩, 胡晓阳. 基于深度Q网络的多起点多终点AGV路径规划[J]. 计算机集成制造系统, 2023, 29(8): 2550-2562.
[2]	马冯超, 陈思溢, 刘锦. 基于云联盟协同机制的利益优化方法[J]. 计算机集成制造系统, 2023, 29(7): 2385-2396.
[3]	刘雨舟, 方贤文. MARL-GPN:一种基于多智能体强化学习的博弈Petri网[J]. 计算机集成制造系统, 2023, 29(5): 1590-1601.
[4]	李国燕, 薛翔, 刘毅, 潘玉恒. 改进TD3的SDN车联网边缘计算卸载策略[J]. 计算机集成制造系统, 2023, 29(5): 1627-1634.
[5]	熊志华, 陈昊, 王长生, 岳明, 侯文彬, 徐斌. 基于深度强化学习的人机协作组装任务分配[J]. 计算机集成制造系统, 2023, 29(3): 789-800.
[6]	李玉, 苌道方, 高银萍, 凌强. 基于数字孪生的自动化集装箱码头多AGV动态调度[J]. 计算机集成制造系统, 2023, 29(12): 4175-4190.
[7]	李少东, 袁小钢, 牛捷. 基于SARSA算法的机器人轴孔装配策略[J]. 计算机集成制造系统, 2023, 29(11): 3669-3680.
[8]	蔡静雯, 马玉敏, 黎声益, 刘鹃. 基于Q学习的智能车间自适应调度方法#br#[J]. 计算机集成制造系统, 2023, 29(11): 3727-3737.
[9]	贺俊杰, 张洁, 张朋, 郑鹏, 王明. 基于多智能体强化学习的纺织面料染色车间动态调度方法[J]. 计算机集成制造系统, 2023, 29(1): 61-74.
[10]	蔡泽, 胡耀光, 闻敬谦, 张立祥. 复杂动态环境下基于深度强化学习的AGV避障方法[J]. 计算机集成制造系统, 2023, 29(1): 236-245.
[11]	黄子钊, 庄子龙, 滕浩, 秦威, 秦涛, 邹鹰. 自动化码头出口箱箱位分配优化超启发式算法[J]. 计算机集成制造系统, 2022, 28(8): 2619-2632.
[12]	杨琪森, 王慎执, 桑金楠, 王朝飞, 黄高, 吴澄, 宋士吉. 复杂开放水域下智能船舶路径规划与避障方法[J]. 计算机集成制造系统, 2022, 28(7): 2030-2040.
[13]	孙阳君, 赵宁. 多机器人存取系统动态调度方法[J]. 计算机集成制造系统, 2022, 28(7): 2213-2228.
[14]	沈倩, 管在林, 张正敏, 王创剑, 岳磊. 面向卷烟生产调度的集成产能过滤算法与仿真技术的优化框架[J]. 计算机集成制造系统, 2022, 28(5): 1462-1471.
[15]	崔建双, 吕玥, 徐子涵. 基于Q—学习的超启发式模型及算法求解多模式资源约束项目调度问题[J]. 计算机集成制造系统, 2022, 28(5): 1472-1481.

结合逆向强化学习与强化学习的晶圆批处理设备调度方法

Wafer batch device scheduling method combining reverse reinforcement learning and reinforcement learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics