Computer Integrated Manufacturing System ›› 2022, Vol. 28 ›› Issue (11): 3433-3442.DOI: 10.13196/j.cims.2022.11.009

Previous Articles     Next Articles

Optimal repositioning of driverless taxi under uncertain demand

ZHOU Xiaoting1,WU Lubin1,ZHANG Yu2,JIANG Shancheng1+   

  1. 1.School of Intelligent Systems Engineering,Sun Yat-sen University
    2.School of Business Administration,Southwestern University of Finance and Economics
  • Online:2022-11-30 Published:2022-12-08
  • Supported by:
    Project supported by the National Key Research and Development Program,China (No.2020YFB1713800),the National Natural Science Foundation,China(No.71901180,71801031),and the Guangdong Provincial Basic and Applied Basic Research Foundation,China(No.2019A1515011962).

基于不确定需求的无人驾驶出租车优化调度

周晓婷1,吴禄彬1,章宇2,姜善成1+   

  1. 1.中山大学智能工程学院
    2.西南财经大学工商管理学院
  • 基金资助:
    国家重点研发计划资助项目(2020YFB1713800);国家自然科学基金资助项目(71901180,71801031);广东省基础与应用基础研究基金资助项目(2019A1515011962)。

Abstract: To reduce the amount of empty taxies and make passengers more easily to take a taxi in peak hours,a model-free deep reinforcement learning framework was proposed to dispatch driverless taxi under uncertain demand.The framework comprehensively considered the benefit of service providers as well as the waiting cost of customers.A well-designed Twin Delayed Deep Deterministic policy gradient (TD3) algorithm was introduced to optimize the problem and allocate resources.The simulator was built based on real taxi trip data from New York.To improve the robustness of the algorithm,uncertain demands were added to the training process.The experimental results showed that the algorithm could make non-shortsighted and effective strategy under uncertain demand.

Key words: reinforcement learning, driverless taxi, vehicle repositioning, policy gradient

摘要: 为了减少乘客在高峰期打车难和出租车空载的情况,面对不确定的出行需求,提出一个无模型深度强化学习框架,以解决无人驾驶出租车调度问题。该框架使用马尔可夫决策模型进行建模,综合考虑了运营商收益与顾客等待成本,使用基于策略的深度强化学习算法——双延迟深度确定性策略梯度算法(TD3)对无人驾驶出租车进行调度,达到合理分配空闲车辆资源的目的。基于纽约市的真实出租车出行数据搭建了环境模拟器,通过在训练过程中加入不确定需求来增强算法鲁棒性。实验结果证明了该方法在求解不确定需求下的无人驾驶出租车调度问题时的有效性。

关键词: 强化学习, 无人驾驶出租车, 车辆调度, 策略梯度

CLC Number: