基于Q—学习的超启发式模型及算法求解多模式资源约束项目调度问题

doi:10.13196/j.cims.2022.05.018

计算机集成制造系统 ›› 2022, Vol. 28 ›› Issue (5): 1472-1481.DOI: 10.13196/j.cims.2022.05.018

基于Q—学习的超启发式模型及算法求解多模式资源约束项目调度问题

崔建双,吕玥,徐子涵

北京科技大学经济管理学院

出版日期:2022-05-30 发布日期:2022-06-08
基金资助:
国家自然科学基金资助项目(71871017);北京市教委社科基金资助项目(SM201910037004)。

Q-learning based hyper-heuristic algorithm for solving multi-mode resource-constrained project scheduling problem

Online:2022-05-30 Published:2022-06-08
Supported by:
Project supported by the National Natural Science Foundation,China (No.71871017),and the Social Science Foundation of Beijing Municipal Education commission,China (No.SM201910037004).

摘要/Abstract

摘要： 为了更好地解决传统的元启发式算法机制单一和面向问题定制不足等问题,提高算法的整体通用性,本文提出一种基于Q—学习的超启发式模型,并基于该模型设计实现了一种超启发式算法,求解多模式资源约束项目调度问题(MRCPSP)。该模型架构分为高低两层,低层由具有多种异构机制和不同参数的元启发式算子组成,高层则依据Q—学习策略自动选择低层算子。模型将多种优秀的元启发式算法与反馈—学习强化机制有机整合,具备灵活的可扩展性。为检验算法效果,从MRCPSP标杆算例库中选取了上千个规模不等的算例,设计了等价比较实验环节,并与最新公开文献提供的结果进行了比较。结果表明,基于Q—学习的超启发式算法在目标值、通用性、鲁棒性等多项性能指标上均表现优异,可以借鉴应用到其他各种组合优化问题。值得一提的是,针对J30算例的计算结果有多达41个算例获得了比当前公开文献报告的已知最优解更好的结果。

关键词: 超启发式模型, 强化学习, Q—学习, 多模式资源约束项目调度问题, 元启发式算法, 反馈—学习强化机制

Abstract: To solve the problems of traditional meta heuristic algorithm such as single mechanism and Insufficient problem-oriented customization,a Reinforcement Learning based Hyper-heuristic Model (RLHM) was presented,and an algorithm on this basis was designed for solving the Multi-mode Resource-Constrained Project Scheduling Problem (MRCPSP).The model architecture was divided into two layers,the Lower Layer Heuristic (LLH) was composed of meta-heuristic operators with multiple heterogeneous mechanisms,and the upper layer automatically selected LLH operators according to the reinforcement learning strategy.The model integrated various excellent meta-heuristic implementation methods with feedback-learning reinforcement mechanisms,which had flexible and scalability.To test the effect of the model,thousands of instances of different sizes were selected from the well-known benchmark of Project Scheduling Problem Library (PSPLIB),and verification links under different conditions were designed and compared with the results provided by the latest public literature.The results showed that RLHM and its algorithm performed well in the performance indicators such as target value,calculation time,and robustness,which were worthy of popularization and application to various combinations and optimization problems in other phases.For the calculation results of the J30,as many as 41 instances had obtained better results than the known optimal solutions reported in the current public literature.

Key words: hyper-heuristic, reinforcement learning, Q-learning, multi-mode resource-constrained project scheduling problem, meta-heuristic implementation methods, feedback-learning reinforcement mechanisms

中图分类号:

TP301

崔建双, 吕玥, 徐子涵. 基于Q—学习的超启发式模型及算法求解多模式资源约束项目调度问题[J]. 计算机集成制造系统, 2022, 28(5): 1472-1481.

[1]	高鹏, 苏雍贺, 左颖, 陶飞. 基于强化学习的分布式光伏运维资源动态调度[J]. 计算机集成制造系统, 2022, 28(2): 552-563.
[2]	冯春,张祎伟,黄成,姜文彪,武之炜. 双足机器人步态控制的深度强化学习方法[J]. 计算机集成制造系统, 2021, 27(8): 2341-2349.
[3]	汪逸晖,高亮. 乌鸦搜索算法的改进及其在工程约束优化问题中的应用[J]. 计算机集成制造系统, 2021, 27(7): 1871-1883.
[4]	崔鹏浩,王军强,张文沛,李洋. 基于深度强化学习的流水线预测性维护决策[J]. 计算机集成制造系统, 2021, 27(12): 3416-3428.
[5]	陈勇,王昊天,易文超,裴植,王成,吴光华. 基于元胞机与强化学习的多扰动车间调度算法[J]. 计算机集成制造系统, 2021, 27(12): 3536-3549.
[6]	张韵,钟慧超,张春江,李新宇,丛建臣. 基于机器学习的多策略并行遗传算法[J]. 计算机集成制造系统, 2021, 27(10): 2921-2928.
[7]	肖鹏飞,张超勇,孟磊磊,洪辉,戴稳. 基于深度强化学习的非置换流水车间调度问题[J]. 计算机集成制造系统, 2021, 27(1): 193-206.
[8]	张景玲,冯勤炳,赵燕伟,刘金龙,冷龙龙. 基于强化学习的超启发算法求解有容量车辆路径问题[J]. 计算机集成制造系统, 2020, 26(第4): 1118-1129.
[9]	李锋,陈勇,王家序,汤宝平. 基于强化学习单元匹配循环神经网络的滚动轴承状态趋势预测[J]. 计算机集成制造系统, 2020, 26(8): 2050-2059.
[10]	徐小斐,陈婧,饶运清,孟荣华,袁博,罗强. 迁移蚁群强化学习算法及其在矩形排样中的应用[J]. 计算机集成制造系统, 2020, 26(12): 3236-3247.
[11]	罗强,饶运清,刘泉辉,李世红. 求解矩形件排样问题的十进制狼群算法[J]. 计算机集成制造系统, 2019, 25(第5): 1169-1179.
[12]	夏金,孙宏波,孙立民. 基于强化学习的生产再决策问题[J]. 计算机集成制造系统, 2019, 25(第11): 2935-2942.
[13]	杨宏兵,沈露,成明,陶来发. 带退化效应多态生产系统调度与维护集成优化[J]. 计算机集成制造系统, 2018, 24(第1): 80-88.
[14]	李文超,,，严洪森,. 基于链约束的Job-Shop型知识化制造单元自进化算法[J]. , 2012, 18(09): 0-0.
[15]	徐新黎，郝平，王万良. 多Agent动态调度方法在染色车间调度中的应用[J]. , 2010, 16(03): 0-0.

基于Q—学习的超启发式模型及算法求解多模式资源约束项目调度问题

Q-learning based hyper-heuristic algorithm for solving multi-mode resource-constrained project scheduling problem

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics