Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

19Citations
Citations of this article
38Readers
Mendeley users who have this article in their library.

Abstract

We present state-of-the-art results on morphosyntactic tagging across different varieties of Arabic using fine-tuned pre-trained transformer language models. Our models consistently outperform existing systems in Modern Standard Arabic and all the Arabic dialects we study, achieving 2.6% absolute improvement over the previous state-of-the-art in Modern Standard Arabic, 2.8% in Gulf, 1.6% in Egyptian, and 8.3% in Levantine. We explore different training setups for fine-tuning pre-trained transformer language models, including training data size, the use of external linguistic resources, and the use of annotated data from other dialects in a low-resource scenario. Our results show that strategic fine-tuning using datasets from other high-resource dialects is beneficial for a low-resource dialect. Additionally, we show that high-quality morphological analyzers as external linguistic resources are beneficial especially in low-resource settings.

Cite

CITATION STYLE

APA

Inoue, G., Khalifa, S., & Habash, N. (2022). Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1708–1719). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-acl.135

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free