Perplexity of n-gram and dependency language models

Martin Popel; David Mareček

Conference Proceedings

Perplexity of n-gram and dependency language models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6231 LNAI 173-180

DOI: 10.1007/978-3-642-15760-8_23

10Citations

13Readers

Get full text

Abstract

Language models (LMs) are essential components of many applications such as speech recognition or machine translation. LMs factorize the probability of a string of words into a product of P(w i |h i ), where h i is the context (history) of word w i . Most LMs use previous words as the context. The paper presents two alternative approaches: post-ngram LMs (which use following words as context) and dependency LMs (which exploit dependency structure of a sentence and can use e.g. the governing word as context). Dependency LMs could be useful whenever a topology of a dependency tree is available, but its lexical labels are unknown, e.g. in tree-to-tree machine translation. In comparison with baseline interpolated trigram LM both of the approaches achieve significantly lower perplexity for all seven tested languages (Arabic, Catalan, Czech, English, Hungarian, Italian, Turkish). © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Popel, M., & Mareček, D. (2010). Perplexity of n-gram and dependency language models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6231 LNAI, pp. 173–180). https://doi.org/10.1007/978-3-642-15760-8_23

Perplexity of n-gram and dependency language models

Abstract

Cite

Register to see more suggestions