Online LDA-based language model adaptation

5Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we present our improvements in online topic-based language model adaptation. Our aim is to enhance the automatic speech recognition of a multi-topic speech which is to be recognized in the real-time (online). Latent Dirichlet Allocation (LDA) is an unsupervised topic model designed to uncover hidden semantic relationships between words and documents in a text corpus and thus reveal latent topics automatically. We use LDA to cluster the text corpus and to predict topics online from partial hypotheses during the real-time speech recognition. Based on detected topic changes in the speech, we adapt the language model on-the-fly. We are demonstrating the improvement of our system on the task of online subtitling of TV news, where we achieved 18% relative reduction of perplexity and 3.52% relative reduction of WER over non-adapted system.

Cite

CITATION STYLE

APA

Lehečka, J., & Pražák, A. (2018). Online LDA-based language model adaptation. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 334–341). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_36

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free