UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Peter Auer; Ronald Ortner

Journal Article

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Auer P
Ortner R

Periodica Mathematica Hungarica (2010) 61(1) 55-65

DOI: 10.1007/s10998-010-3055-6

170Citations

111Readers

Get full text

Abstract

In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const, where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const. © 2010 Akadémiai Kiadó, Budapest, Hungary.

Author supplied keywords

multi-armed bandit problem
regret

Cite

CITATION STYLE

APA

Auer, P., & Ortner, R. (2010). UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1), 55–65. https://doi.org/10.1007/s10998-010-3055-6

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Abstract

Author supplied keywords

Cite

Register to see more suggestions