We consider the Multi-armed bandit problem under the PAC ("probably approximately correct") model. It was shown by Even-Dar et al. that given n arms, it suffices to play the arms a total of O((n/ε2)log(1/ δ)) times to find an ε-optimal arm with probability of at least 1-δ. Our contribution is a matching lower bound that holds for any sampling policy. We also generalize the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the arms are not.
CITATION STYLE
Mannor, S., & Tsitsiklis, J. N. (2003). Lower bounds on the sample complexity of exploration in the multi-armed bandit problem. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2777, pp. 418–432). Springer Verlag. https://doi.org/10.1007/978-3-540-45167-9_31
Mendeley helps you to discover research relevant for your work.