Training data augmentation for low-resource morphological inflection

Toms Bergmanis; Katharina Kann; Hinrich Schütze; Sharon Goldwater

Conference ProceedingsOPEN ACCESS

Training data augmentation for low-resource morphological inflection

CoNLL 2017 - Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection (2017) 31-39

DOI: 10.18653/v1/k17-2002

46Citations

97Readers

Abstract

This work describes the UoE-LMU submission for the CoNLL-SIGMORPHON 2017 Shared Task on Universal Morphological Reinflection, Subtask 1: given a lemma and target morphological tags, generate the target inflected form. We evaluate several ways to improve performance in the 1000-example setting: three methods to augment the training data with identical input-output pairs (i.e., autoencoding), a heuristic approach to identify likely pairs of inflectional variants from an unlabeled corpus, and a method for cross-lingual knowledge transfer. We find that autoencoding random strings works surprisingly well, outperformed only slightly by autoencoding words from an unlabelled corpus. The random string method also works well in the 10,000-example setting despite not being tuned for it. Among 18 submissions our system takes 1st and 6th place in the 10k and 1k settings, respectively.

Cite

CITATION STYLE

APA

Bergmanis, T., Kann, K., Schütze, H., & Goldwater, S. (2017). Training data augmentation for low-resource morphological inflection. In CoNLL 2017 - Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection (pp. 31–39). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/k17-2002

Training data augmentation for low-resource morphological inflection

Abstract

Cite

Register to see more suggestions