计算机集成制造系统 ›› 2017, Vol. 23 ›› Issue (第1期): 144-153.DOI: 10.13196/j.cims.2017.01.016

• 产品创新开发技术 • 上一篇    下一篇

基于马氏决策过程的易逝品联合策略

郑江波,程福阳,杨柳   

  1. 暨南大学管理学院
  • 出版日期:2017-01-31 发布日期:2017-01-31
  • 基金资助:
    广东省自然科学基金资助项目(2016Z00052)。

Jointed decisions for perishable product with Markov decision process

  • Online:2017-01-31 Published:2017-01-31
  • Supported by:
    Project supported by the Natural Science Foundation of Guangdong Province,China(No.2016Z00052).

摘要: 为了有效解决零售商在销售易逝品时的订货、旧产品处理及定价的联合决策问题,提出运用马氏决策过程建立模型及使用Q学习算法求得最优策略。最优策略包括各个状态下选择的决策动作,它能使从现在起及后续无限期的贴现总值为最大。算法中的迭代公式通过不断与环境进行互动并得到反馈,时刻更新最优策略。基于有限的状态集和动作集,在状态转移概率及当期期望收益未知的情况下,算法经过长时间学习后能够得到稳定的最优策略。研究发现,各参数(变化)对联合策略中各策略的特征有不同的影响,该结论为启发式策略的相关研究提供了一定的理论支持和解决思路。

关键词: 易逝品, 马氏决策过程, Q学习算法, 订货策略, 定价策略

Abstract: To solve the jointed decisions problem of ordering,pricing and old products disposing faced for selling perishable products with a multi-period shelf life over an infinite horizon effectively,a model with Markov decision theory was established and the optimal policy was computed by using Q-learning algorithm.The optimal policy indicated the action of all states which could maximize the long-run discounted expected profit from current period.Through interacting with the environment and obtaining the feedback continuously,the iterate formula of algorithm renewed the optimal policy constantly.The stationary optimal policy would be computed after sufficient learning under situation of state and action space were finite and discrete,while the state transition probability and expected profit were not necessarily be known.The research showed that the different parameters had different and significant impact on the characteristic of each decision,and the conclusion provided some support and thought for researches of heuristic strategy.

Key words: perishable product, Markov decision process, Q-learning algorithm, ordering decisions, pricing decisions

中图分类号: