A Universal Dependencies Corpora Maintenance Methodology Using Downstream Application

2Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.

Abstract

This paper investigates updates of Universal Dependencies (UD) treebanks in 23 languages and their impact on a downstream application. Numerous people are involved in updating UD's annotation guidelines and treebanks in various languages. However, it is not easy to verify whether the updated resources maintain universality with other language resources. Thus, validity and consistency of multilingual corpora should be tested through application tasks involving syntactic structures with PoS tags, dependency labels, and universal features. We apply the syntactic parsers trained on UD treebanks from multiple versions (2.0 to 2.7) to a clause-level sentiment extractor. We then analyze the relationships between attachment scores of dependency parsers and performance in application tasks. For future UD developments, we show examples of outputs that differ depending on version.

Cite

CITATION STYLE

APA

Iwamoto, R., Kanayama, H., Rademaker, A., & Ohko, T. (2021). A Universal Dependencies Corpora Maintenance Methodology Using Downstream Application. In SIGTYP 2021 - 3rd Workshop on Research in Computational Typology and Multilingual NLP, Proceedings of the Workshop (pp. 23–31). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.sigtyp-1.3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free