Data-driven dependency parsing of new languages using incomplete and noisy training data

Kathrin Spreyer; Jonas Kuhn

Conference Proceedings

Data-driven dependency parsing of new languages using incomplete and noisy training data

CoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning (2009) 12-20

DOI: 10.3115/1596374.1596380

24Citations

91Readers

Get full text

Abstract

We present a simple but very effective approach to identifying high-quality data in noisy data sets for structured problems like parsing, by greedily exploiting partial structures. We analyze our approach in an annotation projection framework for dependency trees, and show how dependency parsers from two different paradigms (graph-based and transition-based) can be trained on the resulting tree fragments. We train parsers for Dutch to evaluate our method and to investigate to which degree graph-based and transitionbased parsers can benefit from incomplete training data. We find that partial correspondence projection gives rise to parsers that outperform parsers trained on aggressively filtered data sets, and achieve unlabeled attachment scores that are only 5% behind the average UAS for Dutch in the CoNLL-X Shared Task on supervised parsing (Buchholz and Marsi, 2006). © 2009 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Spreyer, K., & Kuhn, J. (2009). Data-driven dependency parsing of new languages using incomplete and noisy training data. In CoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 12–20). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1596374.1596380

Data-driven dependency parsing of new languages using incomplete and noisy training data

Abstract

Cite

Register to see more suggestions