Rethinking supervised learning and reinforcement learning in task-oriented dialogue systems

7Citations
Citations of this article
90Readers
Mendeley users who have this article in their library.

Abstract

Dialogue policy learning for Task-oriented Dialogue Systems (TDSs) has enjoyed great progress recently mostly through employing Reinforcement Learning (RL) methods. However, these approaches have become very sophisticated. It is time to re-evaluate it. Are we really making progress developing dialogue agents only based on RL? We demonstrate how (1) traditional supervised learning together with (2) a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art (SOTA) RL-based methods. First, we introduce a simple dialogue action decoder to predict the appropriate actions. Then, the traditional multi-label classification solution for dialogue policy learning is extended by adding dense layers to improve the dialogue agent performance. Finally, we employ the Gumbel-Softmax estimator to alternatively train the dialogue agent and the dialogue reward model without using RL. Based on our extensive experimentation, we can conclude the proposed methods can achieve more stable and higher performance with fewer efforts, such as the domain knowledge required to design a user simulator and the intractable parameter tuning in reinforcement learning. Our main goal is not to beat RL with supervised learning, but to demonstrate the value of rethinking the role of RL and supervised learning in optimizing TDSs.

Cite

CITATION STYLE

APA

Li, Z., Kiseleva, J., & de Rijke, M. (2020). Rethinking supervised learning and reinforcement learning in task-oriented dialogue systems. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 3537–3546). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.316

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free