Adaptive playouts in monte-carlo tree search with policy-gradient reinforcement learning

Tobias Graf; Marco Platzner

Conference Proceedings

Adaptive playouts in monte-carlo tree search with policy-gradient reinforcement learning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9525 1-11

DOI: 10.1007/978-3-319-27992-3_1

14Citations

9Readers

Get full text

Abstract

Monte-Carlo Tree Search evaluates positions with the help of a playout policy. If the playout policy evaluates a position wrong then there are cases where the tree-search has difficulties to find the correct move due to the large search-space. This paper explores adaptive playoutpolicies which improve the playout-policy during a tree-search. With the help of policy-gradient reinforcement learning techniques we optimize the playout-policy to give better evaluations. We tested the algorithm in Computer Go and measured an increase in playing strength of more than 100 ELO. The resulting program was able to deal with difficult test-cases which are known to pose a problem for Monte-Carlo-Tree-Search.

Cite

CITATION STYLE

APA

Graf, T., & Platzner, M. (2015). Adaptive playouts in monte-carlo tree search with policy-gradient reinforcement learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9525, pp. 1–11). Springer Verlag. https://doi.org/10.1007/978-3-319-27992-3_1

Adaptive playouts in monte-carlo tree search with policy-gradient reinforcement learning

Abstract

Cite

Register to see more suggestions