Abstract
In this paper, I present an approach using one-vs-one classification scheme with TF-IDF term weighting on character n-grams for identifying Arabic dialects used in social media. The scheme was evaluated in the context of the third Nuanced Arabic Dialect Identification (NADI 2022) shared task for identifying Arabic dialects used in Twitter messages. The approach was implemented with logistic regression loss and trained using stochastic gradient decent (SGD) algorithm. This simple method achieved a macro F1 score of 22.89% and 10.83% on TEST A and TEST B, respectively, in comparison to an approach based on AraBERT pretrained transformer model which achieved a macro F1 score of 30.01% and 14.84%, respectively. My submission based on AraBERT scored a macro F1 average of 22.42% and was ranked 10 out of the 19 teams who participated in the task.
Cite
CITATION STYLE
Al Abdulsalam, A. A. (2022). SQU-CS @ NADI 2022: Dialectal Arabic Identification using One-vs-One Classification with TF-IDF Weights Computed on Character n-grams. In WANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop (pp. 436–441). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.wanlp-1.45
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.