Off-Policy Recommendation System Without Exploration

Chengwei Wang; Tengfei Zhou; Chen Chen; Tianlei Hu; Gang Chen

Conference ProceedingsOPEN ACCESS

Off-Policy Recommendation System Without Exploration

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12084 LNAI 16-27

DOI: 10.1007/978-3-030-47426-3_2

10Citations

15Readers

Abstract

Recommendation System (RS) can be treated as an intelligent agent which aims to generate policy maximizing customers’ long term satisfaction. Off-policy reinforcement learning methods based on Q-learning and actor-critic methods are commonly used to train RS. Though these methods can leverage previously collected dataset for sampling efficient training, they are sensitive to the distribution of off-policy data and make limited progress unless more on-policy data are collected. However, allowing a badly-trained RS to interact with customers can result in unpredictable loss. Therefore, it is highly desirable that the off-policy method can stably train an RS when the off-policy data is fixed and there is no further interaction with the environment. To fulfill these requirements, we devise a novel method name Generator Constrained Q-learning (GCQ). GCQ additionally trains an action generator via supervised learning. The generator is used to mimic data distribution and stabilize the performance of recommendation policy. Empirical studies show that the proposed method outperforms state-of-the-art techniques on both offline and simulated online environments.

Cite

CITATION STYLE

APA

Wang, C., Zhou, T., Chen, C., Hu, T., & Chen, G. (2020). Off-Policy Recommendation System Without Exploration. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12084 LNAI, pp. 16–27). Springer. https://doi.org/10.1007/978-3-030-47426-3_2

Off-Policy Recommendation System Without Exploration

Abstract

Cite

Register to see more suggestions