Bandit based Monte-Carlo planning

Levente Kocsis; Csaba Szepesvári

Conference ProceedingsOPEN ACCESS

Bandit based Monte-Carlo planning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4212 LNAI 282-293

DOI: 10.1007/11871842_29

1.9kCitations

1.2kReaders

Abstract

For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4212 LNAI, pp. 282–293). Springer Verlag. https://doi.org/10.1007/11871842_29

Bandit based Monte-Carlo planning

Abstract

Cite

Register to see more suggestions