Generalized algorithms for constructing statistical language models

Cyril Allauzen; Mehryar Mohri; Brian Roark

Conference ProceedingsOPEN ACCESS

Generalized algorithms for constructing statistical language models

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2003) 2003-July

DOI: 10.3115/1075096.1075102

102Citations

113Readers

Abstract

Recent text and speech processing applications such as speech mining raise new and more general problems related to the construction of language models. We present and describe in detail several new and efficient algorithms to address these more general problems and report experimental results demonstrating their usefulness. We give an algorithm for computing efficiently the expected counts of any sequence in a word lattice output by a speech recognizer or any arbitrary weighted automaton; describe a new technique for creating exact representations of n-gram language models by weighted automata whose size is practical for offline use even for a vocabulary size of about 500,000 words and an u-gram order n= 6 and present a simple and more general technique for constructing class-based language models that allows each class to represent an arbitrary weighted automaton. An efficient implementation of our algorithms and techniques has been incorporated in a general software library for language modeling, the GRM Library, that includes many other text and grammar processing functionalities.

Cite

CITATION STYLE

APA

Allauzen, C., Mohri, M., & Roark, B. (2003). Generalized algorithms for constructing statistical language models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2003-July). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1075096.1075102

Generalized algorithms for constructing statistical language models

Abstract

Cite

Register to see more suggestions