Deep reinforcement learning is a promising approach to training a dialog manager, but current methods struggle with the large state and action spaces of multi-domain dialog systems. Building upon Deep Q-learning from Demonstrations (DQfD), an algorithm that scores highly in difficult Atari games, we leverage dialog data to guide the agent to successfully respond to a user's requests. We make progressively fewer assumptions about the data needed, using labeled, reduced-labeled, and even unlabeled data to train expert demonstrators. We introduce Reinforced Fine-tune Learning, an extension to DQfD, enabling us to overcome the domain gap between the datasets and the environment. Experiments in a challenging multi-domain dialog system framework validate our approaches, and get high success rates even when trained on out-of-domain data.
CITATION STYLE
Gordon-Hall, G., Gorinski, P. J., & Cohen, S. B. (2020). Learning dialog policies from weak demonstrations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1394–1405). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.129
Mendeley helps you to discover research relevant for your work.