Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences

Nikesh Garera; Chris Callison-Burch; David Yarowsky

Conference ProceedingsOPEN ACCESS

Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences

CoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning (2009) 129-137

DOI: 10.3115/1596374.1596397

46Citations

93Readers

Abstract

This paper presents novel improvements to the induction of translation lexicons from monolingual corpora using multilingual dependency parses. We introduce a dependency-based context model that incorporates long-range dependencies, variable context sizes, and reordering. It providesa 16% relative improvement over the baseline approach that uses a fixed context window of adjacent words. Its Top 10 accuracy for noun translation is higher than that of a statistical translation model trained on a Spanish-English parallel corpus containing 100,000 sentence pairs. We generalize the evaluation to other word-types, and show that the performance can be increased to 18% relative by preserving part-of-speech equivalencies during translation. © 2009 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Garera, N., Callison-Burch, C., & Yarowsky, D. (2009). Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences. In CoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 129–137). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1596374.1596397

Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences

Abstract

Cite

Register to see more suggestions