Computer Integrated Manufacturing System ›› 2024, Vol. 30 ›› Issue (10): 3566-3577.DOI: 10.13196/j.cims.2022.0155

Previous Articles     Next Articles

Day-off scheduling approach based on reinforcement learning

LI Tiantian,CHEN Desheng,CAO Bin   

  1. College of Computer Science and Technology/College of Software,Zhejiang University of Technology
  • Online:2024-10-31 Published:2024-11-07
  • Supported by:
    Project supported by the Natural Science Foundation of Zhejiang Province,China(No.LQ21F020019),and the Key R & D Program of Zhejiang Province,China(No.2022C01145).

基于强化学习的人员轮休调度方法

李甜甜,陈德胜,曹斌   

  1. 浙江工业大学计算机科学与技术学院(软件学院)
  • 作者简介:
    李甜甜(1989-),女,河南南阳人,讲师,博士,研究方向:实时任务调度优化、大数据查询优化等,E-mail:ttli89@zjut.edu.cn;

    陈德胜(1998-),男,浙江温州人,硕士研究生,研究方向:人员调度管理,E-mail:deshengchen@zjut.edu.cn;

    曹斌(1985-),男,山西盂县人,副教授,博士,研究方向:业务流程管理、大数据,E-mail:bincao@zjut.edu.cn。
  • 基金资助:
    浙江省自然科学基金资助项目(LQ21F020019);浙江省重点研发计划资助项目(2022C01145)。

Abstract: Aiming at the problems of poor performance,low efficiency and inaccuracy to express constraints of traditional scheduling approaches,a day-off scheduling approach based on reinforcement learning was proposed.In this approach,the day-off scheduling process was regarded as a Markov Decision Process (MDP),and an action mask method was utilized for expressing scheduling constraints.Deep Q-Network (DQN) was developed for learning scheduling strategies from MDP.Finally,the learned scheduling strategies were used to generate scheduling results following daily workload efficiently under constraints.Compared to traditional Genetic Algorithm (GA) methods,experimental results showed that the proposed method had less variance and was more efficient.

Key words: day-off scheduling, reinforcement learning, Markov decision process, deep Q-network, action mask

摘要: 针对传统调度方法求解效果差、效率低、轮休约束表达不准确的问题,首次提出一种基于强化学习的人员轮休调度方法。该方法将轮休调度过程构建为Markov决策过程,利用动作掩码方法实现轮休约束,通过深度Q网络(DQN)方法对轮休调度的策略进行学习。最后,采用学习得到的调度策略对人员进行快速安排。实验表明,在遵守轮休约束的前提下,该方法能够快速给出匹配每日人力需求的人员安排。对比传统的基于遗传的方法,该方法在人力需求拟合上的安排偏差更小,求解效率更高。

关键词: 轮休调度, 强化学习, Markov决策过程, 深度Q网络, 动作掩码

CLC Number: