Greedy confidence pursuit: A pragmatic approach to multi-bandit optimization

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We address the practical problem of maximizing the number of high-confidence results produced among multiple experiments sharing an exhaustible pool of resources. We formalize this problem in the framework of bandit optimization as follows: given a set of multiple multi-armed bandits and a budget on the total number of trials allocated among them, select the top-m arms (with high confidence) for as many of the bandits as possible. To solve this problem, which we call greedy confidence pursuit, we develop a method based on posterior sampling. We show empirically that our method outperforms existing methods for top-m selection in single bandits, which has been studied previously, and improves on baseline methods for the full greedy confidence pursuit problem, which has not been studied previously. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Bachman, P., & Precup, D. (2013). Greedy confidence pursuit: A pragmatic approach to multi-bandit optimization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8188 LNAI, pp. 241–256). https://doi.org/10.1007/978-3-642-40988-2_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free