Constraint-aware Policy Optimization to Solve the Vehicle Routing Problem with Time Windows

4Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

The vehicle routing problem with time windows (VRPTW) as one of the most known combinatorial operations (CO) problem is considered to be a tough issue in practice and the main challenge of that is to find the approximate solutions within a reasonable time. In recent years, reinforcement learning (RL) based methods have gained increasing attention in many CO problems, such as the vehicle routing problems (VRP), due to their enormous potential to efficiently generate high-quality solutions. However, neglecting the information between the constraints and the solutions makes previous approaches performance unideal in some strongly constrained problems, like VRPTW. We present the constraint-aware policy optimization (CPO) for VRPTW that can let the agent learn the constraints as a representation of the whole environment to improve the generalization of RL methods. Extensive experiments on both the Solomon benchmark and the generated datasets demonstrate that our approach significantly outperforms other competition methods.

Cite

CITATION STYLE

APA

Zhang, R., Yu, R., & Xia, W. (2022). Constraint-aware Policy Optimization to Solve the Vehicle Routing Problem with Time Windows. Information Technology and Control, 51(1), 126–138. https://doi.org/10.5755/j01.itc.51.1.29924

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free