MULTI-ARMED BANDITS WITH COVARIATES: THEORY AND APPLICATIONS

4Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

“Multi-armed bandits” were introduced as a new direction in the then-nascent field of sequential analysis, developed during World War II in response to the need for more efficient testing of anti-aircraft gunnery, and later as a concrete application of dynamic programming and optimal control of Markov decision processes. A comprehensive theory that unified both directions emerged in the 1980s, providing important insights and algorithms for diverse applications in many science, technology, engineering and mathematics fields. The turn of the millennium marked the onset of a “personalization revolution,” from personalized medicine and online personalized advertising and recommender systems (e.g. Netflix’s recommendations for movies and TV shows, Amazon’s recommendations for products to purchase, and Microsoft’s Matchbox recommender). This has required an extension of classical bandit theory to nonparametric contextual bandits, where “contextual” refers to the incorporation of personal information as covariates. Such theory is developed herein, together with illustrative applications, statistical models, and computational tools for its implementation.

Cite

CITATION STYLE

APA

Kim, D. W., Lai, T. L., & Xu, H. (2021). MULTI-ARMED BANDITS WITH COVARIATES: THEORY AND APPLICATIONS. Statistica Sinica, 31, 2275–2287. https://doi.org/10.5705/ss.202020.0454

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free