Time-Decaying Bandits for Non-stationary Systems

Junpei Komiyama; Tao Qin

Journal Article

Time-Decaying Bandits for Non-stationary Systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8877 460-466

DOI: 10.1007/978-3-319-13129-0_40

6Citations

7Readers

Get full text

Abstract

Contents displayed on web portals (e.g., news articles at Yahoo.com) are usually adaptively selected from a dynamic set of candidate items, and the attractiveness of each item decays over time. The goal of those websites is to maximize the engagement of users (usually measured by their clicks) on the selected items.We formulate this kind of applications as a new variant of bandit problems where new arms are dynamically added into the candidate set and the expected reward of each arm decays as the round proceeds. For this new problem, a direct application of the algorithms designed for stochastic MAB (e.g., UCB) will lead to over-estimation of the rewards of old arms, and thus cause a misidentification of the optimal arm. To tackle this challenge, we propose a new algorithm that can adaptively estimate the temporal dynamics in the rewards of the arms, and effectively identify the best arm at a given time point on this basis. When the temporal dynamics are represented by a set of features, the proposed algorithm is able to enjoy a sub-linear regret. Our experiments verify the effectiveness of the proposed algorithm.

Cite

CITATION STYLE

APA

Komiyama, J., & Qin, T. (2014). Time-Decaying Bandits for Non-stationary Systems. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8877, 460–466. https://doi.org/10.1007/978-3-319-13129-0_40

Time-Decaying Bandits for Non-stationary Systems

Abstract

Cite

Register to see more suggestions