PAC identification of a bandit arm relative to a reward quantile

21Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

We propose a PAC formulation for identifying an arm in an n-armed bandit whose mean is within a fixed tolerance of the mth highest mean. This setup generalises a previous formulation with m = 1, and differs from yet another one which requires m such arms to be identified. The key implication of our proposed approach is the ability to derive upper bounds on the sample complexity that depend on n/m in place of n. Consequently, even when the number of arms is infinite, we only need a finite number of samples to identify an arm that compares favourably with a fixed reward quantile. This facility makes our approach attractive to applications such as drug discovery, wherein the number of arms (molecular configurations) may run into a few thousands. We present sampling algorithms for both the finite- and infinite-armed cases, and validate their efficiency through theoretical and experimental analysis. We also present a lower bound on the worst case sample complexity of PAC algorithms for our problem, which matches our upper bound up to a logarithmic factor.

Cite

CITATION STYLE

APA

Chaudhuri, A. R., & Kalyanakrishnan, S. (2017). PAC identification of a bandit arm relative to a reward quantile. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017 (pp. 1777–1783). AAAI press. https://doi.org/10.1609/aaai.v31i1.10802

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free