Domain-Adapted BERT-based Models for Nuanced Arabic Dialect Identification and Tweet Sentiment Analysis

Gıyaseddin Bayrak; Abdul Majeed Issifu

Conference Proceedings

Domain-Adapted BERT-based Models for Nuanced Arabic Dialect Identification and Tweet Sentiment Analysis

WANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop (2022) 425-430

DOI: 10.18653/v1/2022.wanlp-1.43

8Citations

34Readers

Get full text

Abstract

This paper summarizes the solution of the Nuanced Arabic Dialect Identification (NADI) 2022 shared task. It consists of two subtasks: a country-level Arabic Dialect Identification (ADID) and an Arabic Sentiment Analysis (ASA). Our work shows the importance of using domain-adapted models and language-specific pre-processing in NLP task solutions. We implement a simple but strong baseline technique to increase the stability of fine-tuning settings to obtain a good generalization of models. Our best model for the Dialect Identification subtask achieves a Macro F-1 score of 25.54% as an average of both Test-A (33.89%) and Test-B (19.19%) F-1 scores. We also obtained a Macro F-1 score of 74.29% of positive and negative sentiments only, in the Sentiment Analysis task.

Cite

CITATION STYLE

APA

Bayrak, G., & Issifu, A. M. (2022). Domain-Adapted BERT-based Models for Nuanced Arabic Dialect Identification and Tweet Sentiment Analysis. In WANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop (pp. 425–430). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.wanlp-1.43

Domain-Adapted BERT-based Models for Nuanced Arabic Dialect Identification and Tweet Sentiment Analysis

Abstract

Cite

Register to see more suggestions