Hierarchical Reinforcement Learning with Guidance for Multi-Domain Dialogue Policy

Mahdin Rohmatillah; Jen Tzung Chien

Journal ArticleOPEN ACCESS

Hierarchical Reinforcement Learning with Guidance for Multi-Domain Dialogue Policy

IEEE/ACM Transactions on Audio Speech and Language Processing (2023) 31 748-761

DOI: 10.1109/TASLP.2023.3235202

14Citations

12Readers

Abstract

Achieving high performance in a multi-domain dialogue system with low computation is undoubtedly challenging. Previous works applying an end-To-end approach have been very successful. However, the computational cost remains a major issue since the large-sized language model using GPT-2 is required. Meanwhile, the optimization for individual components in the dialogue system has not shown promising result, especially for the component of dialogue management due to the complexity of multi-domain state and action representation. To cope with these issues, this article presents an efficient guidance learning where the imitation learning and the hierarchical reinforcement learning (HRL) with human-in-The-loop are performed to achieve high performance via an inexpensive dialogue agent. The behavior cloning with auxiliary tasks is exploited to identify the important features in latent representation. In particular, the proposed HRL is designed to treat each goal of a dialogue with the corresponding sub-policy so as to provide efficient dialogue policy learning by utilizing the guidance from human through action pruning and action evaluation, as well as the reward obtained from the interaction with the simulated user in the environment. Experimental results on ConvLab-2 framework show that the proposed method achieves state-of-The-Art performance in dialogue policy optimization and outperforms the GPT-2 based solutions in end-To-end system evaluation.

Author supplied keywords

Cite

CITATION STYLE

APA

Rohmatillah, M., & Chien, J. T. (2023). Hierarchical Reinforcement Learning with Guidance for Multi-Domain Dialogue Policy. IEEE/ACM Transactions on Audio Speech and Language Processing, 31, 748–761. https://doi.org/10.1109/TASLP.2023.3235202

Hierarchical Reinforcement Learning with Guidance for Multi-Domain Dialogue Policy

Abstract

Author supplied keywords

Cite

Register to see more suggestions