Infinitely many-armed bandits with unknown value distribution

6Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We consider a version of the classical stochastic Multi-Armed bandit problem in which the number of arms is large compared to the time horizon, with the goal of minimizing the cumulative regret. Here, the mean-reward (or value) of newly chosen arms is assumed to be i.i.d. We further make the simplifying assumption that the value of an arm is revealed once this arm is chosen. We present a general lower bound on the regret, and learning algorithms that achieve this bound up to a logarithmic factor. Contrary to previous work, we do not assume that the functional form of the tail of the value distribution is known. Furthermore, we also consider a variant of our model where sampled arms are non-retainable, namely are lost if not used continuously, with similar near-optimality results. © 2014 Springer-Verlag.

Cite

CITATION STYLE

APA

David, Y., & Shimkin, N. (2014). Infinitely many-armed bandits with unknown value distribution. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8724 LNAI, pp. 307–322). Springer Verlag. https://doi.org/10.1007/978-3-662-44848-9_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free