Web augmentation of language models for continuous speech recognition of SMS text messages

Mathias Creutz; Sami Virpioja; Anna Kovaleva

Conference ProceedingsOPEN ACCESS

Web augmentation of language models for continuous speech recognition of SMS text messages

EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (2009) 157-165

DOI: 10.3115/1609067.1609084

14Citations

85Readers

Abstract

In this paper, we present an efficient query selection algorithm for the retrieval of web text data to augment a statistical language model (LM). The number of retrieved relevant documents is optimized with respect to the number of queries submitted. The querying scheme is applied in the domain of SMS text messages. Continuous speech recognition experiments are conducted on three languages: English, Spanish, and French. The web data is utilized for augmenting in-domain LMs in general and for adapting the LMs to a user-specific vocabulary. Word error rate reductions of up to 6.6 % (in LM augmentation) and 26.0 % (in LM adaptation) are obtained in setups, where the size of the web mixture LM is limited to the size of the baseline in-domain LM. © 2009 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Creutz, M., Virpioja, S., & Kovaleva, A. (2009). Web augmentation of language models for continuous speech recognition of SMS text messages. In EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 157–165). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1609067.1609084

Web augmentation of language models for continuous speech recognition of SMS text messages

Abstract

Cite

Register to see more suggestions