Thompson sampling: An asymptotically optimal finite-time analysis

Emilie Kaufmann; Nathaniel Korda; Rémi Munos

Conference Proceedings

Thompson sampling: An asymptotically optimal finite-time analysis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7568 LNAI 199-213

DOI: 10.1007/978-3-642-34106-9_18

303Citations

190Readers

Get full text

Abstract

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Kaufmann, E., Korda, N., & Munos, R. (2012). Thompson sampling: An asymptotically optimal finite-time analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7568 LNAI, pp. 199–213). https://doi.org/10.1007/978-3-642-34106-9_18

Thompson sampling: An asymptotically optimal finite-time analysis

Abstract

Cite

Register to see more suggestions