Recommender systems have to handle a highly non-stationary environment, due to users' fast changing interests over time. Traditional solutions have to periodically rebuild their models, despite high computational cost. But this still cannot empower them to automatically adjust to abrupt changes in trends caused by timely information. It is important to note that the changes of reward distributions caused by a non-stationary environment can also be context dependent. When the change is orthogonal to the given context, previously maintained models should be reused for better recommendation prediction. In this work, we focus on contextual bandit algorithms for making adaptive recommendations. We capitalize on the unique context-dependent property of reward changes to conquer the challenging non-stationary environment for model update. In particular, we maintain a dynamic ensemble of contextual bandit models, where each bandit model's reward estimation quality is monitored regarding given context and possible environment changes. Only the admissible models to the current environment will be used for recommendation. We provide a rigorous upper regret bound analysis of our proposed algorithm. Extensive empirical evaluations on both synthetic and three real-world datasets confirmed the algorithm's advantage against existing non-stationary solutions that simply create new models whenever an environment change is detected.
CITATION STYLE
Wu, Q., Wang, H., Li, Y., & Wang, H. (2019). Dynamic ensemble of contextual bandits to satisfy users’ changing interests. In The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019 (pp. 2080–2090). Association for Computing Machinery, Inc. https://doi.org/10.1145/3308558.3313727
Mendeley helps you to discover research relevant for your work.