计算机集成制造系统 ›› 2023, Vol. 29 ›› Issue (1): 61-74.DOI: 10.13196/j.cims.2023.01.006

• • 上一篇    下一篇

基于多智能体强化学习的纺织面料染色车间动态调度方法

贺俊杰1,张洁1+,张朋1,郑鹏2,王明1   

  1. 1.东华大学机械工程学院
    2.上海交通大学机械与动力工程学院
  • 出版日期:2023-01-31 发布日期:2023-02-14
  • 基金资助:
    国家重点研发计划资助项目(2019YFB1706300);东华大学青年教师启动基金资助项目。

Multi-agent reinforcement learning based textile dyeing workshop dynamic scheduling method

HE Junjie1,ZHANG Jie1+,ZHANG Peng1,ZHENG Peng2,WANG Ming1   

  1. 1.School of Mechanical Engineering,Donghua University
    2.School of Mechanical Engineering,Shanghai Jiao Tong University
  • Online:2023-01-31 Published:2023-02-14
  • Supported by:
    Project supported by the National Key Research and Development Program,China(No.2019YFB1706300),and the Initial Research Funds for Young Teacher of Donghua University,China.

摘要: 针对任务随订单动态到达环境下的纺织面料染色车间动态调度问题,以最小化总拖期时间为优化目标,提出了基于多智能体循环近端策略优化(MA-RPPO)强化学习的完全反应式调度方法。首先,针对染色车间调度的组批和排缸两个子问题,设计了组批和排缸两个强化学习智能体;然后,针对车间任务的动态性,引入长短期记忆网络(LSTM)提取车间动态信息,提高智能体对动态环境的自适应能力;进一步提出组批智能体和排缸智能体的交互机制,实现组批与排缸全局优化;最后,抽取问题约束与优化目标的相关特征并设计奖励函数,通过动态调度机制驱动智能体的交互学习获得最优调度策略。经某印染企业的实例验证表明,所提方法对不同规模问题的求解性能均优于多种常用的高性能启发式规则,有效降低了产品的总拖期时间,提升了企业订单的准时交付能力。

关键词: 染色车间调度, 并行批处理机, 动态调度, 多智能体强化学习, 长短期记忆网络, 总拖期时间

Abstract: Aiming at the dynamic scheduling problem of textile dyeing workshop in which tasks release dynamically by orders,a Multi-Agent Recurrent Proximal Policy Optimization (MA-RPPO) reinforcement learning(RL) based fully reactive scheduling method was proposed by taking the minimum total tardiness time as the optimization goal.For the two sub-problems of group batching and the vats scheduling in the dyeing workshop,the batching agent and the scheduling agent were designed to group batches and schedule the vats.For the dynamics of dyeing tasks,Long Short Term Memory (LSTM) was introduced to extract workshop dynamic information and improve the adaptive ability of the agent;further the interaction mechanism between agents was proposed to achieve global optimization of two sub-problems.The relevant features of constraints and optimization goal were extracted,and the reward function was designed.The agents interacted with the dyeing workshop environment through dynamic scheduling mechanisms to learn the optimal scheduling strategy.The case study from a dyeing enterprise showed that the proposed method was better than some high-performance heuristic rules in different scales of problems,reducing the total tardiness time of products and improving the ability of timely delivery of enterprise effectively.

Key words: dyeing workshop scheduling, batch processing machine, dynamic scheduling, multi-agent reinforcement learning, long short term memory network, total tardiness

中图分类号: