In this paper, we present two deep learning approaches that are based on AraBERT, submitted to the Nuanced Arabic Dialect Identification (NADI) shared task of the Seventh Workshop for Arabic Natural Language Processing (WANLP 2022). NADI consists of two main sub-tasks, mainly country-level dialect and sentiment identification for dialectical Arabic. We present one system per sub-task. The first system is a multi-task learning model that consists of a shared AraBERT encoder with three task-specific classification layers. This model is trained to jointly learn the country-level dialect of the tweet as well as the region-level and area-level dialects. The second system is a distilled model of an ensemble of models trained using K-fold cross-validation. Each model in the ensemble consists of an AraBERT model and a classifier, fine-tuned on (K-1) folds of the training set. Our team Pythoneers achieved rank 6 on the first test set of the first sub-task, rank 9 on the second test set of the first sub-task, and rank 4 on the test set of the second sub-task.
CITATION STYLE
Attieh, J., & Hassan, F. (2022). Arabic Dialect Identification and Sentiment Classification using Transformer-based Models. In WANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop (pp. 485–490). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.wanlp-1.54
Mendeley helps you to discover research relevant for your work.