Abstract
In order to improve the sample-efficiency of deep reinforcement learning (DRL), we implemented imagination augmented agent (I2A) in spoken dialogue systems (SDS). Although I2A achieves a higher success rate than baselines by augmenting predicted future into a policy network, its complicated architecture introduces unwanted instability. In this work, we propose actor-double-critic (ADC) to improve the stability and overall performance of I2A. ADC simplifies the architecture of I2A to reduce excessive parameters and hyper-parameters. More importantly, a separate model-based critic shares parameters between actions and makes back-propagation explicit. In our experiments on Cambridge Restaurant Booking task, ADC enhances success rates considerably and shows robustness to imperfect environment models. In addition, ADC exhibits the stability and sample-efficiency as significantly reducing the baseline standard deviation of success rates and reaching the 80% success rate with half training data.
Cite
CITATION STYLE
Wu, Y. C., Tseng, B. H., & Gašić, M. (2020). Actor-double-critic: Incorporating model-based critic for task-oriented dialogue systems. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 854–863). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.75
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.