Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages

17Citations
Citations of this article
70Readers
Mendeley users who have this article in their library.

Abstract

The lack of annotated data is a big issue for building reliable NLP systems for most of the world’s languages. But this problem can be alleviated by automatic data generation. In this paper, we present a new data augmentation method for artificially creating new dependency-annotated sentences. The main idea is to swap subtrees between annotated sentences while enforcing strong constraints on those trees to ensure maximal grammaticality of the new sentences. We also propose a method to perform low-resource experiments using resource-rich languages by mimicking low-resource languages by sampling sentences under a low-resource distribution. In a series of experiments, we show that our newly proposed data augmentation method outperforms previous proposals using the same basic inputs.

Cite

CITATION STYLE

APA

Dehouck, M., & Gómez-Rodríguez, C. (2020). Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 3818–3830). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.339

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free