Robust Contextual Bandits via Bootstrapping

2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrapping technique.We first establish sufficient conditions under which our estimator converges asymptotically to the ground truth of contextual bandit UCB. We further derive a second order correction for our estimator so as to obtain its confidence level with a finite number of rounds. To demonstrate the versatility of the estimator, we apply it to design a BootLinUCB algorithm for the contextual bandit. We prove that the BootLinUCB has a sub-linear regret upper bound and also conduct extensive experiments to validate its superior performance.

Cite

CITATION STYLE

APA

Tang, Q., Xie, H., Xia, Y., Lee, J., & Zhu, Q. (2021). Robust Contextual Bandits via Bootstrapping. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 13B, pp. 12182–12189). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i13.17446

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free