MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

Shashank Sonkar; Zichao Wang; Richard G. Baraniuk

Conference ProceedingsOPEN ACCESS

MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 219-226

DOI: 10.18653/v1/2023.sustainlp-1.16

0Citations

15Readers

Abstract

This paper investigates the problem of Named Entity Recognition (NER) for extreme lowresource languages with only a few hundred tagged data samples. A critical enabler of most of the progress in NER is the readily available, large-scale training data for languages such as English and French. However, NER for lowresource languages remains relatively underexplored, leaving much room for improvement. We propose Mask Augmented Named Entity Recognition (MANER), a simple yet effective method that leverages the distributional hypothesis of pre-trained masked language models (MLMs) to improve NER performance for lowresource languages significantly. MANER repurposes the [mask] token in MLMs, which encodes valuable semantic contextual information, for NER prediction. Specifically, we prepend a [mask] token to every word in a sentence and predict the named entity for each word from its preceding [mask] token. We demonstrate that MANER is well-suited for NER in low-resource languages; our experiments show that for 100 languages with as few as 100 training examples, it improves on the state-of-the-art by up to 48% and by 12% on average on F1 score. We also perform detailed analyses and ablation studies to understand the scenarios that are best suited to MANER.

Cite

CITATION STYLE

APA

Sonkar, S., Wang, Z., & Baraniuk, R. G. (2023). MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 219–226). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.sustainlp-1.16

MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

Abstract

Cite

Register to see more suggestions