Gated embeddings in end-to-end speech recognition for conversational-context fusion

13Citations
Citations of this article
134Readers
Mendeley users who have this article in their library.

Abstract

We present a novel conversational-context aware end-to-end speech recognizer based on a gated neural network that incorporates conversational-context/word/speech embeddings. Unlike conventional speech recognition models, our model learns longer conversational-context information that spans across sentences and is consequently better at recognizing long conversations. Specifically, we propose to use text-based external word and/or sentence embeddings (i.e., fastText, BERT) within an end-to-end framework, yielding significant improvement in word error rate with better conversational-context representation. We evaluated the models on the Switchboard conversational speech corpus and show that our model outperforms standard end-to-end speech recognition models.

Cite

CITATION STYLE

APA

Kim, S., Dalmia, S., & Metze, F. (2020). Gated embeddings in end-to-end speech recognition for conversational-context fusion. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 1131–1141). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-1107

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free