Towards a general, continuous model of turn-taking in spoken dialogue using LSTM recurrent neural networks

Gabriel Skantze

Conference ProceedingsOPEN ACCESS

Towards a general, continuous model of turn-taking in spoken dialogue using LSTM recurrent neural networks

Skantze G

SIGDIAL 2017 - 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference (2017) 220-230

DOI: 10.18653/v1/w17-5527

86Citations

129Readers

Abstract

Previous models of turn-taking have mostly been trained for specific turn-taking decisions, such as discriminating between turn shifts and turn retention in pauses. In this paper, we present a predictive, continuous model of turn-taking using Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). The model is trained on human-human dialogue data to predict upcoming speech activity in a future time window. We show how this general model can be applied to two different tasks that it was not specifically trained for. First, to predict whether a turn-shift will occur or not in pauses, where the model achieves a better performance than human observers, and better than results achieved with more traditional models. Second, to make a prediction at speech onset whether the utterance will be a short backchannel or a longer utterance. Finally, we show how the hidden layer in the network can be used as a feature vector for turn-taking decisions in a human-robot interaction scenario.

Cite

CITATION STYLE

APA

Skantze, G. (2017). Towards a general, continuous model of turn-taking in spoken dialogue using LSTM recurrent neural networks. In SIGDIAL 2017 - 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference (pp. 220–230). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-5527

Towards a general, continuous model of turn-taking in spoken dialogue using LSTM recurrent neural networks

Abstract

Cite

Register to see more suggestions