Bandit based Monte-Carlo planning

Levente Kocsis; Csaba Szepesvári

Conference ProceedingsOPEN ACCESS

Bandit based Monte-Carlo planning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4212 LNAI 282-293

DOI: 10.1007/11871842_29

2.2kCitations

1.2kReaders

Abstract

For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives. © Springer-Verlag Berlin Heidelberg 2006.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4212 LNAI, pp. 282–293). Springer Verlag. https://doi.org/10.1007/11871842_29

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 629

73%

Researcher 148

17%

Professor / Associate Prof. 73

Lecturer / Post doc 13

Readers' Discipline

Computer Science 638

73%

Engineering 176

20%

Mathematics 36

Agricultural and Biological Sciences 23

Article Metrics

Mentions

News Mentions: 1

References: 4

View details >

Bandit based Monte-Carlo planning

Abstract

References Powered by Scopus

Finite-time analysis of the multiarmed bandit problem

Asymptotically efficient adaptive allocation rules

The nonstochastic multiarmed bandit problem

Cited by Powered by Scopus

Mastering the game of Go with deep neural networks and tree search

Mastering the game of Go without human knowledge

Taking the human out of the loop: A review of Bayesian optimization

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline

Article Metrics