Towards semi-automatic generation of proposition banks for low-resource languages

13Citations
Citations of this article
83Readers
Mendeley users who have this article in their library.

Abstract

Annotation projection based on parallel corpora has shown great promise in inexpensively creating Proposition Banks for languages for which high-quality parallel corpora and syntactic parsers are available. In this paper, we present an experimental study where we apply this approach to three languages that lack such resources: Tamil, Bengali and Malayalam. We find an average quality difference of 6 to 20 absolute F-measure points vis-a-vis high-resource languages, which indicates that annotation projection alone is insufficient in low-resource scenarios. Based on these results, we explore the possibility of using annotation projection as a starting point for inexpensive data curation involving both experts and non-experts. We give an outline of what such a process may look like and present an initial study to discuss its potential and challenges.

Cite

CITATION STYLE

APA

Akbik, A., Kumar, V., & Li, Y. (2016). Towards semi-automatic generation of proposition banks for low-resource languages. In EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 993–998). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d16-1102

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free