This paper investigates the problem of Named Entity Recognition (NER) for extreme lowresource languages with only a few hundred tagged data samples. A critical enabler of most of the progress in NER is the readily available, large-scale training data for languages such as English and French. However, NER for lowresource languages remains relatively underexplored, leaving much room for improvement. We propose Mask Augmented Named Entity Recognition (MANER), a simple yet effective method that leverages the distributional hypothesis of pre-trained masked language models (MLMs) to improve NER performance for lowresource languages significantly. MANER repurposes the [mask] token in MLMs, which encodes valuable semantic contextual information, for NER prediction. Specifically, we prepend a [mask] token to every word in a sentence and predict the named entity for each word from its preceding [mask] token. We demonstrate that MANER is well-suited for NER in low-resource languages; our experiments show that for 100 languages with as few as 100 training examples, it improves on the state-of-the-art by up to 48% and by 12% on average on F1 score. We also perform detailed analyses and ablation studies to understand the scenarios that are best suited to MANER.
CITATION STYLE
Sonkar, S., Wang, Z., & Baraniuk, R. G. (2023). MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 219–226). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.sustainlp-1.16
Mendeley helps you to discover research relevant for your work.