LSTM-based turn-taking estimation model using lexical/prosodic contents and dialog history

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

A natural conversation involves rapid exchanges of turns while talking. Taking turns at appropriate timing or intervals is a requisite feature for a dialog system as a conversation partner. We propose a Recurrent Neural Network (RNN) based model that takes the current utterance and the dialog history as its input to classify utterances into turn-taking related classes and estimates the turn-taking timing. The dialog history is represented by a sequence of speaker-specified joint embedding of lexical and prosodic contents. To this end, we trained a neural network to embed the lexical and the prosodic contents into a joint embedding space. To learn meaningful embedding spaces, the prosodic feature sequence from each single utterance is mapped into a fixed-dimensional space using RNN and combined with utterance lexical embedding. These joint embeddings are then shifted to different parts of embedding spaces according to the speakers. Finally, the speaker-specified joint embeddings are used as the input of our proposed model. We tested this model on a spontaneous conversation dataset and confirmed that it outperformed conventional models that use lexical/prosodic features and dialog history without speaker information.

Cite

CITATION STYLE

APA

Liu, C., Ishi, C., & Ishiguro, H. (2019). LSTM-based turn-taking estimation model using lexical/prosodic contents and dialog history. Transactions of the Japanese Society for Artificial Intelligence, 34(2). https://doi.org/10.1527/tjsai.C-I65

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free