Lower bounds on the sample complexity of exploration in the multi-armed bandit problem

Shie Mannor; John N. Tsitsiklis

Conference Proceedings

Lower bounds on the sample complexity of exploration in the multi-armed bandit problem

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2003) 2777 418-432

DOI: 10.1007/978-3-540-45167-9_31

9Citations

105Readers

Get full text

Abstract

We consider the Multi-armed bandit problem under the PAC ("probably approximately correct") model. It was shown by Even-Dar et al. that given n arms, it suffices to play the arms a total of O((n/ε2)log(1/ δ)) times to find an ε-optimal arm with probability of at least 1-δ. Our contribution is a matching lower bound that holds for any sampling policy. We also generalize the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the arms are not.

Cite

CITATION STYLE

APA

Mannor, S., & Tsitsiklis, J. N. (2003). Lower bounds on the sample complexity of exploration in the multi-armed bandit problem. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2777, pp. 418–432). Springer Verlag. https://doi.org/10.1007/978-3-540-45167-9_31

Lower bounds on the sample complexity of exploration in the multi-armed bandit problem

Abstract

Cite

Register to see more suggestions