Regret bounds for sleeping experts and bandits

126Citations
Citations of this article
61Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this "Sleeping Experts" problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sub-logarithmic factor) with respect to the best-ordering benchmark. © 2010 The Author(s).

Cite

CITATION STYLE

APA

Kleinberg, R., Niculescu-Mizil, A., & Sharma, Y. (2010). Regret bounds for sleeping experts and bandits. Machine Learning, 80(2–3), 245–272. https://doi.org/10.1007/s10994-010-5178-7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free