Treebank translation for cross-lingual parser induction

Jörg Tiedemann; Željko Agić; Joakim Nivre

Conference Proceedings

Treebank translation for cross-lingual parser induction

CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings (2014) 130-140

DOI: 10.3115/v1/w14-1614

56Citations

104Readers

Get full text

Abstract

Cross-lingual learning has become a popular approach to facilitate the development of resources and tools for low-density languages. Its underlying idea is to make use of existing tools and annotations in resource-rich languages to create similar tools and resources for resource-poor languages. Typically, this is achieved by either projecting annotations across parallel corpora, or by transferring models from one or more source languages to a target language. In this paper, we explore a third strategy by using machine translation to create synthetic training data from the original source-side annotations. Specifically, we apply this technique to dependency parsing, using a cross-lingually unified treebank for adequate evaluation. Our approach draws on annotation projection but avoids the use of noisy source-side annotation of an unrelated parallel corpus and instead relies on manual treebank annotation in combination with statistical machine translation, which makes it possible to train fully lexicalized parsers. We show that this approach significantly outperforms delexicalized transfer parsing.

Cite

CITATION STYLE

APA

Tiedemann, J., Agić, Ž., & Nivre, J. (2014). Treebank translation for cross-lingual parser induction. In CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings (pp. 130–140). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-1614

Treebank translation for cross-lingual parser induction

Abstract

Cite

Register to see more suggestions