Session-level language modeling for conversational speech

25Citations
Citations of this article
105Readers
Mendeley users who have this article in their library.

Abstract

We propose to generalize language models for conversational speech recognition to allow them to operate across utterance boundaries and speaker changes, thereby capturing conversation-level phenomena such as adjacency pairs, lexical entrainment, and topical coherence. The model consists of a long-short-term memory (LSTM) recurrent network that reads the entire word-level history of a conversation, as well as information about turn taking and speaker overlap, in order to predict each next word. The model is applied in a rescoring framework, where the word history prior to the current utterance is approximated with preliminary recognition results. In experiments in the conversational telephone speech domain (Switchboard) we find that such a model gives substantial perplexity reductions over a standard LSTM-LM with utterance scope, as well as improvements in word error rate.

Cite

CITATION STYLE

APA

Xiong, W., Wu, L., Zhang, J., & Stolcke, A. (2018). Session-level language modeling for conversational speech. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (pp. 2764–2768). Association for Computational Linguistics. https://doi.org/10.18653/v1/d18-1296

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free