Abstract
The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation trade-off in sequential decision problems. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know a set of reward distributions in advance nor the range of the rewards. It is not restricted to Bernoulli distributions and is also invariant under rescaling of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting. © 2014 Springer-Verlag.
Author supplied keywords
Cite
CITATION STYLE
Baransi, A., Maillard, O. A., & Mannor, S. (2014). Sub-sampling for multi-armed bandits. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8724 LNAI, pp. 115–131). Springer Verlag. https://doi.org/10.1007/978-3-662-44848-9_8
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.