We address the practical problem of maximizing the number of high-confidence results produced among multiple experiments sharing an exhaustible pool of resources. We formalize this problem in the framework of bandit optimization as follows: given a set of multiple multi-armed bandits and a budget on the total number of trials allocated among them, select the top-m arms (with high confidence) for as many of the bandits as possible. To solve this problem, which we call greedy confidence pursuit, we develop a method based on posterior sampling. We show empirically that our method outperforms existing methods for top-m selection in single bandits, which has been studied previously, and improves on baseline methods for the full greedy confidence pursuit problem, which has not been studied previously. © 2013 Springer-Verlag.
CITATION STYLE
Bachman, P., & Precup, D. (2013). Greedy confidence pursuit: A pragmatic approach to multi-bandit optimization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8188 LNAI, pp. 241–256). https://doi.org/10.1007/978-3-642-40988-2_16
Mendeley helps you to discover research relevant for your work.