Customized nonlinear bandits for online response selection in neural conversation models

21Citations
Citations of this article
62Readers
Mendeley users who have this article in their library.

Abstract

Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model with a nonlinear reward function that uses distributed representation of text for online response selection. A bidirectional LSTM is used to produce the distributed representations of dialog context and responses, which serve as the input to a contextual bandit. In learning the bandit, we propose a customized Thompson sampling method that is applied to a polynomial feature space in approximating the reward. Experimental results on the Ubuntu Dialogue Corpus demonstrate significant performance gains of the proposed method over conventional linear contextual bandits. Moreover, we report encouraging response selection performance of the proposed neural bandit model using the Recall@k metric for a small set of online training samples.

References Powered by Scopus

Long Short-Term Memory

77407Citations
N/AReaders
Get full text

Finite-time analysis of the multiarmed bandit problem

4947Citations
N/AReaders
Get full text

A contextual-bandit approach to personalized news article recommendation

2000Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Survey on Applications of Multi-Armed and Contextual Bandits

127Citations
N/AReaders
Get full text

An visual dialog augmented interactive recommender system

53Citations
N/AReaders
Get full text

Perturbed-history exploration in stochastic multi-armed bandits

17Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Liu, B., Yu, T., Lane, I., & Mengshoel, O. J. (2018). Customized nonlinear bandits for online response selection in neural conversation models. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 (pp. 5245–5252). AAAI press. https://doi.org/10.1609/aaai.v32i1.12028

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 27

75%

Researcher 6

17%

Professor / Associate Prof. 3

8%

Readers' Discipline

Tooltip

Computer Science 32

84%

Decision Sciences 2

5%

Engineering 2

5%

Linguistics 2

5%

Save time finding and organizing research with Mendeley

Sign up for free