Learning dialog policies from weak demonstrations

9Citations
Citations of this article
118Readers
Mendeley users who have this article in their library.

Abstract

Deep reinforcement learning is a promising approach to training a dialog manager, but current methods struggle with the large state and action spaces of multi-domain dialog systems. Building upon Deep Q-learning from Demonstrations (DQfD), an algorithm that scores highly in difficult Atari games, we leverage dialog data to guide the agent to successfully respond to a user's requests. We make progressively fewer assumptions about the data needed, using labeled, reduced-labeled, and even unlabeled data to train expert demonstrators. We introduce Reinforced Fine-tune Learning, an extension to DQfD, enabling us to overcome the domain gap between the datasets and the environment. Experiments in a challenging multi-domain dialog system framework validate our approaches, and get high success rates even when trained on out-of-domain data.

Cite

CITATION STYLE

APA

Gordon-Hall, G., Gorinski, P. J., & Cohen, S. B. (2020). Learning dialog policies from weak demonstrations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1394–1405). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.129

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free