Capturing divergence in dependency trees to improve syntactic projection

5Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Obtaining syntactic parses is an important step in many NLP pipelines. However, most of the world’s languages do not have a large amount of syntactically annotated data available for building parsers. Syntactic projection techniques attempt to address this issue by using parallel corpora consisting of resource-poor and resource-rich language pairs, taking advantage of a parser for the resource-rich language and word alignment between the languages to project the parses onto the data for the resource-poor language. These projection methods can suffer, however, when syntactic structures for some sentence pairs in the two languages look quite different. In this paper, we investigate the use of small, parallel, annotated corpora to automatically detect divergent structural patterns between two languages. We then use these detected patterns to improve projection algorithms and dependency parsers, allowing for better performing NLP tools for resource-poor languages, particularly those that may not have large amounts of annotated data necessary for traditional, fully-supervised methods. While this detection process is not exhaustive, we demonstrate that common patterns of divergence can be identified automatically without prior knowledge of a given language pair, and the patterns can be used to improve performance of syntactic projection and parsing.

Cite

CITATION STYLE

APA

Georgi, R., Xia, F., & Lewis, W. D. (2014). Capturing divergence in dependency trees to improve syntactic projection. Language Resources and Evaluation, 48(4), 709–739. https://doi.org/10.1007/s10579-014-9273-4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free