In this paper we consider the stochastic multi-armed bandit with metric switching costs. Given a set of locations (arms) in a metric space and prior information about the reward available at these locations, cost of getting a sample/play at every location and rules to update the prior based on samples/plays, the task is to maximize a certain objective function constrained to a distance cost of L and cost of plays C. This fundamental and well-studied problem models several optimization problems in robot navigation, sensor networks, labor economics, etc. In this paper we develop a general duality-based framework to provide the first O(1) approximation for metric switching costs; the actual constants being quite small. Since these problems are Max-SNP hard, this result is the best possible. The overall technique and the ensuing structural results are independently of interest in the context of bandit problems with complicated side-constraints. Our techniques also improve the approximation ratio of the budgeted learning problem from 4 to 3∈+∈ε. © 2009 Springer Berlin Heidelberg.
CITATION STYLE
Guha, S., & Munagala, K. (2009). Multi-armed bandits with metric switching costs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5556 LNCS, pp. 496–507). https://doi.org/10.1007/978-3-642-02930-1_41
Mendeley helps you to discover research relevant for your work.