Linear upper confidence bound algorithm for contextual bandit problem with piled rewards

Kuan Hao Huang; Hsuan Tien Lin

Conference Proceedings

Linear upper confidence bound algorithm for contextual bandit problem with piled rewards

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9652 LNAI 143-155

DOI: 10.1007/978-3-319-31750-2_12

3Citations

12Readers

Get full text

Abstract

We study the contextual bandit problem with linear payoff function. In the traditional contextual bandit problem, the algorithm iteratively chooses an action based on the observed context, and immediately receives a reward for the chosen action. Motivated by a practical need in many applications, we study the design of algorithms under the piled-reward setting, where the rewards are received as a pile instead of immediately. We present how the Linear Upper Confidence Bound (Lin- UCB) algorithm for the traditional problem can be naïvely applied under the piled-reward setting, and prove its regret bound. Then, we extend LinUCB to a novel algorithm, called Linear Upper Confidence Bound with Pseudo Reward (LinUCBPR), which digests the observed contexts to choose actionsmore strategically before the piled rewards are received.We prove that LinUCBPR can match LinUCB in the regret bound under the piled-reward setting. Experiments on the artificial and real-world datasets demonstrate the strong performance of LinUCBPR in practice.

Author supplied keywords

Cite

CITATION STYLE

APA

Huang, K. H., & Lin, H. T. (2016). Linear upper confidence bound algorithm for contextual bandit problem with piled rewards. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9652 LNAI, pp. 143–155). Springer Verlag. https://doi.org/10.1007/978-3-319-31750-2_12

Linear upper confidence bound algorithm for contextual bandit problem with piled rewards

Abstract

Author supplied keywords

Cite

Register to see more suggestions