Constrained Labeled Data Generation for Low-Resource Named Entity Recognition

11Citations
Citations of this article
60Readers
Mendeley users who have this article in their library.

Abstract

Named Entity Recognition (NER) in low-resource languages has been a long-standing challenge in NLP. Recent work has shown great progress in two directions: developing cross-lingual features/models to transfer knowledge to low-resource languages, and translating source-language training data into low-resource target-language training data by projecting annotations with cheap resources. We focus on the second direction in this study. Existing methods suffer from the low quality of the resulting annotated data in the target language; for example, they cannot handle word order and lexical ambiguity well. To handle these limitations we propose a novel approach that uses the projected annotation to generate pseudo supervised data with a transformer language model and a constrained beam search. This allows us to generate more diverse, higher quality, as well as higher quantities of annotated data in the target language. Experiments demonstrate that, when combining our method with available cross-lingual features, it achieves state-of-the-art or competitive performance on NER in a low-resource setting, especially for languages that are distant from our source language, English.

Cite

CITATION STYLE

APA

Guo, R., & Roth, D. (2021). Constrained Labeled Data Generation for Low-Resource Named Entity Recognition. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 4519–4533). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.396

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free