Data-driven dependency parsing of new languages using incomplete and noisy training data

22Citations
Citations of this article
90Readers
Mendeley users who have this article in their library.

Abstract

We present a simple but very effective approach to identifying high-quality data in noisy data sets for structured problems like parsing, by greedily exploiting partial structures. We analyze our approach in an annotation projection framework for dependency trees, and show how dependency parsers from two different paradigms (graph-based and transition-based) can be trained on the resulting tree fragments. We train parsers for Dutch to evaluate our method and to investigate to which degree graph-based and transitionbased parsers can benefit from incomplete training data. We find that partial correspondence projection gives rise to parsers that outperform parsers trained on aggressively filtered data sets, and achieve unlabeled attachment scores that are only 5% behind the average UAS for Dutch in the CoNLL-X Shared Task on supervised parsing (Buchholz and Marsi, 2006). © 2009 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Spreyer, K., & Kuhn, J. (2009). Data-driven dependency parsing of new languages using incomplete and noisy training data. In CoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 12–20). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1596374.1596380

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free