Greedy confidence pursuit: A pragmatic approach to multi-bandit optimization

Philip Bachman; Doina Precup

Conference ProceedingsOPEN ACCESS

Greedy confidence pursuit: A pragmatic approach to multi-bandit optimization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 8188 LNAI(PART 1) 241-256

DOI: 10.1007/978-3-642-40988-2_16

1Citations

8Readers

Abstract

We address the practical problem of maximizing the number of high-confidence results produced among multiple experiments sharing an exhaustible pool of resources. We formalize this problem in the framework of bandit optimization as follows: given a set of multiple multi-armed bandits and a budget on the total number of trials allocated among them, select the top-m arms (with high confidence) for as many of the bandits as possible. To solve this problem, which we call greedy confidence pursuit, we develop a method based on posterior sampling. We show empirically that our method outperforms existing methods for top-m selection in single bandits, which has been studied previously, and improves on baseline methods for the full greedy confidence pursuit problem, which has not been studied previously. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Bachman, P., & Precup, D. (2013). Greedy confidence pursuit: A pragmatic approach to multi-bandit optimization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8188 LNAI, pp. 241–256). https://doi.org/10.1007/978-3-642-40988-2_16

Greedy confidence pursuit: A pragmatic approach to multi-bandit optimization

Abstract

Cite

Register to see more suggestions