Learning to Solve a Stochastic Orienteering Problem with Time Windows

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Reinforcement learning (RL) has seen increasing success at solving a variety of combinatorial optimization problems. These techniques have generally been applied to deterministic optimization problems with few side constraints, such as the traveling salesperson problem (TSP) or capacitated vehicle routing problem (CVRP). With this in mind, the recent IJCAI AI for TSP competition challenged participants to apply RL to a difficult routing problem involving optimization under uncertainty and time windows. We present the winning submission to the challenge, which uses the policy optimization with multiple optima (POMO) approach combined with efficient active search and Monte Carlo roll-outs. We present experimental results showing that our proposed approach outperforms the second place approach by 1.7%. Furthermore, our computational results suggest that solving more realistic routing problems may not be as difficult as previously thought.

Cite

CITATION STYLE

APA

Schmitt-Ulms, F., Hottung, A., Sellmann, M., & Tierney, K. (2022). Learning to Solve a Stochastic Orienteering Problem with Time Windows. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13621 LNCS, pp. 108–122). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-24866-5_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free