计算机集成制造系统 ›› 2023, Vol. 29 ›› Issue (11): 3738-3749.DOI: 10.13196/j.cims.2022.0646

• • 上一篇    下一篇

结合逆向强化学习与强化学习的晶圆批处理设备调度方法

王卓君1,张朋2+,张洁2   

  1. 1.东华大学机械工程学院
    2.东华大学人工智能研究院
  • 出版日期:2023-11-30 发布日期:2023-12-04
  • 基金资助:
    国家自然科学基金资助项目(52005099);中央高校基本科研业务费专项资金资助项目(223202100044)。

Wafer batch device scheduling method combining reverse reinforcement learning and reinforcement learning

WANG Zhuojun1,ZHANG Peng2+,ZHANG Jie2   

  1. 1.School of Mechanical Engineering,Donghua University
    2.Artificial Intelligence Research Institute,Donghua University
  • Online:2023-11-30 Published:2023-12-04
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.52005099),and the Special Fund for Basic Scientific Research of Central Universities,China(No.223202100044)。

摘要: 针对晶圆批处理设备调度问题,以最小化生产周期为优化目标,考虑晶圆动态到达、重入加工与不兼容性约束等特点,提出了结合逆向强化学习与强化学习(combine Inverse Reinforcement Learning and Reinforcement Learning,IRL-RL)的晶圆批处理设备调度优化方法。根据批处理设备的加工特性,将问题分解为组批和批次指派两个子问题;由于子问题内部复杂的关联特性使晶圆批处理设备调度内部机理不明,且全局奖励函数设计困难,引入逆向强化学习指导奖励函数的设计;针对晶圆lot的重入加工特性,设计期望流动时间与剩余等待时间关键状态变量;批次指派智能体兼顾考虑任务的紧急程度与工艺类型切换带来的差异生产准备时间进行综合决策,满足批处理设备工艺类型的不兼容性约束;通过设计奖励函数关键参数的非线性特征,解释晶圆lot剩余加工层数与期望流动时间之间的复杂流变关系。24 组标准算例的实验数据表明,IRL-RL算法的优化结果与计算效率优于一般强化学习算法和较优规则等方法;经企业实例数据验证,晶圆的生产周期缩短了15%。

关键词: 晶圆批处理调度, 并行批处理机, 动态调度, 逆向强化学习, 强化学习, 生产周期, 重入加工

Abstract: Aiming at the scheduling problem of wafer batch processing equipment,to minimize the production cycle as the optimization goal,considering the characteristics of dynamic wafer arrival,reentry processing and incompatibility constraints,a scheduling optimization method of wafer batch processing equipment combining Inverse Reinforcement Learning and Reinforcement Learning (IRL-RL) was proposed.According to the processing characteristics of batch processing equipment,the problem was decomposed into two sub-problems:group batch and batch assignment.Due to the complex internal correlation characteristics of sub-problems,the internal mechanism of wafer batch processing equipment scheduling was unclear,and the design of global reward function was difficult,reverse reinforcement learning is introduced to guide the design of reward function.According to the reentry processing characteristics of wafer lot,the key state variables of expected flow time and remaining waiting time were designed.The urgency of the task and the difference in production preparation time caused by process type switching was taken into account of the batch assignment agent to make comprehensive decisions meet the incompatibility constraint of batch equipment process type.By designing the nonlinear characteristics of the key parameters of the reward function,the complex rheological relationship between the number of remaining machining layers and the expected flow time of wafer lot was explained.The experimental data of 24 sets of standard examples showed that the optimization results and computational efficiency of the IRL-RL algorithm were better than those of the general reinforcement learning algorithm and better rules.According to the data of enterprise instance,the wafer production cycle was shortened by 15%.

Key words: wafer batch scheduling, parallel batch processor, dynamic scheduling, inverse reinforcement learning, reinforcement learning, production cycle, reentrant processing

中图分类号: