计算机集成制造系统 ›› 2018, Vol. 24 ›› Issue (第1): 80-88.DOI: 10.13196/j.cims.2018.01.008

• 产品创新开发技术 • 上一篇    下一篇

带退化效应多态生产系统调度与维护集成优化

杨宏兵1,2,沈露1,成明3,陶来发4   

  1. 1.苏州大学机电工程学院
    2.广东工业大学广东省计算机集成制造重点实验室
    3.苏州大学城市轨道交通学院
    4.北京航空航天大学可靠性与系统工程学院
  • 出版日期:2018-01-31 发布日期:2018-01-31
  • 基金资助:
    国家自然科学基金资助项目(51005160);江苏省自然科学基金资助项目(BK20141517,BK20150344);苏州市产业技术创新专项(民生科技)项目(SS201704);中国博士后科学基金资助项目(2016M601885)。

Integrated optimization of scheduling and maintenance in multi-state production systems with deterioration effects

  • Online:2018-01-31 Published:2018-01-31
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.51005160),the Natural Science Foundation of Jiangsu Province,China(No.BK20141517,BK20150344),the Suzhou Industrial Technology Innovation Special Project(People's Livelihood Science and Technology),China(No.SS201704),and the Chinese Postdoctoral Science Foundation,China(No.2016M601885).

摘要: 为了探索设备退化效应对维护和生产调度耦合关系的影响,对多态单机生产系统调度与预防性维护集成优化进行研究。基于预防性维护费用、工件加工成本以及工件完工回报值,建立了无限阶段平均期望报酬Markov决策模型,并分析和证明了该集成优化模型最优平稳策略的存在性,获得了该模型的最优方程。基于模型最优方程设计了一种无模型强化学习算法用于求解Markov决策模型,可有效解决传统动态规划算法在求解模型时所遭遇的维数灾和模型灾难题。为了评估该无模型强化学习的性能,基于无限阶段平均期望报酬设计了一种启发式求解算法,实验分析结果验证了所引入的无模型强化学习算法的有效性。最后,对该强化学习算法参数进行了敏感性分析,探索各参数对算法性能的影响,为算法设计及其性能提高提供了相应的实验依据。

关键词: 多态生产系统, 集成优化, 强化学习, 退化效应, 生产调度, 预测性维护

Abstract: To explore the influence of deterioration effects on coupling relationship of maintenance and production scheduling,an integrated optimization was investigated for production scheduling and preventive maintenance in multi-state production systems.Based on preventive maintenance cost,production cost and finished job rewards,the integrated optimization problem was formulated as a Markov Decision Process (MDP) model of long-run expected average reward over finite-horizon.After analyzing and proving the existence of optimal stationary policy,an optimal equation was obtained for MDP model.To solve the difficulty that the traditional dynamic programming methods suffered from the curse of dimensionality and modeling,a model-free reinforcement learning algorithm was presented to solve the established MDP model on the basis of optimal equation.To evaluate the performance of reinforcement learning,a concise heuristic algorithm was proposed,and the experiments indicated that the reinforcement learning algorithm provide very effective solutions for the problem in comparison with the heuristic algorithm.A parameter sensitivity analysis was performed for the reinforcement learning algorithm,which provided the experiment reference for further design and improvement of the algorithm.

Key words: multi-state systems, integrated optimization, reinforcement learning, deterioration effects, production scheduling, preventive maintenance

中图分类号: