• 论文 •    

基于增强学习的平行机调度研究

张智聪, 郑力, 翁小华   

  1. 1 清华大学 工业工程系,北京100084;2 南佛罗里达大学 工业与管理系统工程系,美国佛罗里达 坦帕33620
  • 出版日期:2007-01-15 发布日期:2007-01-25

Parallel machines scheduling with reinforcement learning

ZHANG Zhi-cong, ZHENG Li, WENG Michael X.   

  1. 1.Dep. of Industrial Eng., Tsinghua Univ., Beijing100084, China; 2.Dep. of Industrial & Management Systems Eng., Univ. of South Florida, Tampa33620, USA
  • Online:2007-01-15 Published:2007-01-25

摘要: 尝试运用增强学习方法来研究平行机调度问题,通过定义系统状态、行为和报酬函数,把调度问题转化为平均报酬型半马尔可夫决策过程,并使用结合函数泛化器的R-Learning算法来解决。提出排名算法,并利用它和两种常用的调度规则(最短期望加工时间规则和先进先出规则)来定义增强学习的行为。实验结果表明,R-Learning算法通过仿真实验学习较优的调度策略,在不同的决策状态下选择最优或次优的行为,对每个测试问题的效果都优于以上任何一条调度规则。

关键词: 调度, 平行机, 增强学习, 马尔可夫决策过程

Abstract: A Reinforcement Learning (RL) method, R-Learning, was used to study parallel machines scheduling problems which was aimed to minimize mean flow time of jobs. The scheduling problem was converted into Semi-Markov Decision Process(SMDP) by defining system state, actions and reward function. It was solved by R-Learning functions. A heuristic, Ranking Algorithm (RA) was proposed and defined as RL as well as two commonly used dispatching rules: Shortest Expected Processing Time (SEPT) and First In First Out (FIFO). Experiment results demonstrated that R-Learning could learn a near-optimal scheduling policy through simulation, i.e. to select optimal or sub-optimal actions at different states. The conclusion was that R-Learning was superior to the above heuristic rules in all test problems.

Key words: scheduling, parallel machines, reinforcement learning, Markov decision process

中图分类号: