Corrective Guidance and Learning for Dialogue Management

11Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Establishing robust dialogue policy with low computation cost is challenging, especially for multi-domain task-oriented dialogue management due to the high complexity in state and action spaces. The previous works mostly using the deterministic policy optimization only attain moderate performance. Meanwhile, state-of-the-art result that uses end-to-end approach is computationally demanding since it utilizes a large-scaled language model based on the generative pre-trained transformer-2 (GPT-2). In this study, a new learning procedure consisting of three learning stages is presented to improve multi-domain dialogue management with corrective guidance. Firstly, the behavior cloning with an auxiliary task is developed to build a robust pre-trained model by mitigating the causal confusion problem in imitation learning. Next, the pre-trained model is rectified by using reinforcement learning via the proximal policy optimization. Lastly, human-in-the-loop learning strategy is fulfilled to enhance the agent performance by directly providing corrective feedback from rule-based agent so that the agent is prevented to trap in confounded states. The experiments on end-to-end evaluation show that the proposed learning method achieves state-of-the-art result by performing nearly identical to the rule-based agent. This method outperforms the second place of 9th dialog system technology challenge (DSTC9) track 2 that uses GPT-2 as the core model in dialogue management.

Cite

CITATION STYLE

APA

Rohmatillah, M., & Chien, J. T. (2021). Corrective Guidance and Learning for Dialogue Management. In International Conference on Information and Knowledge Management, Proceedings (pp. 1548–1557). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482333

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free