Online LDA-based language model adaptation

Jan Lehečka; Aleš Pražák

Conference Proceedings

Online LDA-based language model adaptation

Lecture Notes in Computer Science (2018) 11107 LNAI 334-341

DOI: 10.1007/978-3-030-00794-2_36

5Citations

1Readers

Get full text

Abstract

In this paper, we present our improvements in online topic-based language model adaptation. Our aim is to enhance the automatic speech recognition of a multi-topic speech which is to be recognized in the real-time (online). Latent Dirichlet Allocation (LDA) is an unsupervised topic model designed to uncover hidden semantic relationships between words and documents in a text corpus and thus reveal latent topics automatically. We use LDA to cluster the text corpus and to predict topics online from partial hypotheses during the real-time speech recognition. Based on detected topic changes in the speech, we adapt the language model on-the-fly. We are demonstrating the improvement of our system on the task of online subtitling of TV news, where we achieved 18% relative reduction of perplexity and 3.52% relative reduction of WER over non-adapted system.

Author supplied keywords

Cite

CITATION STYLE

APA

Lehečka, J., & Pražák, A. (2018). Online LDA-based language model adaptation. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 334–341). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_36

Online LDA-based language model adaptation

Abstract

Author supplied keywords

Cite

Register to see more suggestions