Adaptive playouts in monte-carlo tree search with policy-gradient reinforcement learning

14Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Monte-Carlo Tree Search evaluates positions with the help of a playout policy. If the playout policy evaluates a position wrong then there are cases where the tree-search has difficulties to find the correct move due to the large search-space. This paper explores adaptive playoutpolicies which improve the playout-policy during a tree-search. With the help of policy-gradient reinforcement learning techniques we optimize the playout-policy to give better evaluations. We tested the algorithm in Computer Go and measured an increase in playing strength of more than 100 ELO. The resulting program was able to deal with difficult test-cases which are known to pose a problem for Monte-Carlo-Tree-Search.

Cite

CITATION STYLE

APA

Graf, T., & Platzner, M. (2015). Adaptive playouts in monte-carlo tree search with policy-gradient reinforcement learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9525, pp. 1–11). Springer Verlag. https://doi.org/10.1007/978-3-319-27992-3_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free