Policy Networks with Two-Stage Training for Dialogue Systems

46Citations
Citations of this article
172Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we propose to use deep policy networks which are trained with an advantage actor-critic method for statistically optimised dialogue systems. First, we show that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods. Summary state and action spaces lead to good performance but require pre-engineering effort, RL knowledge, and domain expertise. In order to remove the need to define such summary spaces, we show that deep RL can also be trained efficiently on the original state and action spaces. Dialogue systems based on partially observable Markov decision processes are known to require many dialogues to train, which makes them unappealing for practical deployment. We show that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently. Indeed, with only a few hundred dialogues collected with a handcrafted policy, the actor-critic deep learner is considerably bootstrapped from a combination of supervised and batch RL. In addition, convergence to an optimal policy is significantly sped up compared to other deep RL methods initialized on the data with batch RL. All experiments are performed on a restaurant domain derived from the Dialogue State Tracking Challenge 2 (DSTC2) dataset.

References Powered by Scopus

Human-level control through deep reinforcement learning

22571Citations
N/AReaders
Get full text

Mastering the game of Go with deep neural networks and tree search

12805Citations
N/AReaders
Get full text

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

6302Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Neural approaches to conversational AI

215Citations
N/AReaders
Get full text

A survey of available corpora for building data-driven dialogue systems: The journal version

103Citations
N/AReaders
Get full text

Sample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces

63Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Fatemi, M., El Asri, L., Schulz, H., He, J., & Suleman, K. (2016). Policy Networks with Two-Stage Training for Dialogue Systems. In SIGDIAL 2016 - 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference (pp. 101–110). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-3613

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 69

66%

Researcher 26

25%

Professor / Associate Prof. 6

6%

Lecturer / Post doc 3

3%

Readers' Discipline

Tooltip

Computer Science 99

84%

Engineering 9

8%

Linguistics 6

5%

Mathematics 4

3%

Article Metrics

Tooltip
Mentions
News Mentions: 1

Save time finding and organizing research with Mendeley

Sign up for free