UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

170Citations
Citations of this article
111Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const, where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const. © 2010 Akadémiai Kiadó, Budapest, Hungary.

Author supplied keywords

Cite

CITATION STYLE

APA

Auer, P., & Ortner, R. (2010). UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1), 55–65. https://doi.org/10.1007/s10998-010-3055-6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free