Learning heuristics for the tsp by policy gradient

Michel Deudon; Pierre Cournut; Alexandre Lacoste; Yossiri Adulyasak; Louis Martin Rousseau

Conference Proceedings

Learning heuristics for the tsp by policy gradient

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10848 LNCS 170-181

DOI: 10.1007/978-3-319-93031-2_12

218Citations

199Readers

Get full text

Abstract

The aim of the study is to provide interesting insights on how efficient machine learning algorithms could be adapted to solve combinatorial optimization problems in conjunction with existing heuristic procedures. More specifically, we extend the neural combinatorial optimization framework to solve the traveling salesman problem (TSP). In this framework, the city coordinates are used as inputs and the neural network is trained using reinforcement learning to predict a distribution over city permutations. Our proposed framework differs from the one in [1] since we do not make use of the Long Short-Term Memory (LSTM) architecture and we opted to design our own critic to compute a baseline for the tour length which results in more efficient learning. More importantly, we further enhance the solution approach with the well-known 2-opt heuristic. The results show that the performance of the proposed framework alone is generally as good as high performance heuristics (OR-Tools). When the framework is equipped with a simple 2-opt procedure, it could outperform such heuristics and achieve close to optimal results on 2D Euclidean graphs. This demonstrates that our approach based on machine learning techniques could learn good heuristics which, once being enhanced with a simple local search, yield promising results.

Author supplied keywords

Cite

CITATION STYLE

APA

Deudon, M., Cournut, P., Lacoste, A., Adulyasak, Y., & Rousseau, L. M. (2018). Learning heuristics for the tsp by policy gradient. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10848 LNCS, pp. 170–181). Springer Verlag. https://doi.org/10.1007/978-3-319-93031-2_12

Learning heuristics for the tsp by policy gradient

Abstract

Author supplied keywords

Cite

Register to see more suggestions