Sub-sampling for multi-armed bandits

Akram Baransi; Odalric Ambrym Maillard; Shie Mannor

Conference ProceedingsOPEN ACCESS

Sub-sampling for multi-armed bandits

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8724 LNAI(PART 1) 115-131

DOI: 10.1007/978-3-662-44848-9_8

22Citations

24Readers

Abstract

The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation trade-off in sequential decision problems. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know a set of reward distributions in advance nor the range of the rewards. It is not restricted to Bernoulli distributions and is also invariant under rescaling of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting. © 2014 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Baransi, A., Maillard, O. A., & Mannor, S. (2014). Sub-sampling for multi-armed bandits. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8724 LNAI, pp. 115–131). Springer Verlag. https://doi.org/10.1007/978-3-662-44848-9_8

Sub-sampling for multi-armed bandits

Abstract

Author supplied keywords

Cite

Register to see more suggestions