Lower bounds on the sample complexity of exploration in the multi-armed bandit problem

9Citations
Citations of this article
105Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We consider the Multi-armed bandit problem under the PAC ("probably approximately correct") model. It was shown by Even-Dar et al. that given n arms, it suffices to play the arms a total of O((n/ε2)log(1/ δ)) times to find an ε-optimal arm with probability of at least 1-δ. Our contribution is a matching lower bound that holds for any sampling policy. We also generalize the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the arms are not.

Cite

CITATION STYLE

APA

Mannor, S., & Tsitsiklis, J. N. (2003). Lower bounds on the sample complexity of exploration in the multi-armed bandit problem. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2777, pp. 418–432). Springer Verlag. https://doi.org/10.1007/978-3-540-45167-9_31

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free