Budgeted policy learning for task-oriented dialogue systems

24Citations
Citations of this article
151Readers
Mendeley users who have this article in their library.

Abstract

This paper presents a new approach that extends Deep Dyna-Q (DDQ) by incorporating a Budget-Conscious Scheduling (BCS) to best utilize a fixed, small amount of user interactions (budget) for learning task-oriented dialogue agents. BCS consists of (1) a Poisson-based global scheduler to allocate budget over different stages of training; (2) a controller to decide at each training step whether the agent is trained using real or simulated experiences; (3) a user goal sampling module to generate the experiences that are most effective for policy learning. Experiments on a movie-ticket booking task with simulated and real users show that our approach leads to significant improvements in success rate over the state-of-the-art baselines given the fixed budget.

Cite

CITATION STYLE

APA

Zhang, Z., Li, X., Gao, J., & Chen, E. (2020). Budgeted policy learning for task-oriented dialogue systems. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 3742–3751). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-1364

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free