SQU-CS @ NADI 2022: Dialectal Arabic Identification using One-vs-One Classification with TF-IDF Weights Computed on Character n-grams

Abdulrahman A. Al Abdulsalam

Conference Proceedings

SQU-CS @ NADI 2022: Dialectal Arabic Identification using One-vs-One Classification with TF-IDF Weights Computed on Character n-grams

Al Abdulsalam A

WANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop (2022) 436-441

DOI: 10.18653/v1/2022.wanlp-1.45

1Citations

23Readers

Get full text

Abstract

In this paper, I present an approach using one-vs-one classification scheme with TF-IDF term weighting on character n-grams for identifying Arabic dialects used in social media. The scheme was evaluated in the context of the third Nuanced Arabic Dialect Identification (NADI 2022) shared task for identifying Arabic dialects used in Twitter messages. The approach was implemented with logistic regression loss and trained using stochastic gradient decent (SGD) algorithm. This simple method achieved a macro F1 score of 22.89% and 10.83% on TEST A and TEST B, respectively, in comparison to an approach based on AraBERT pretrained transformer model which achieved a macro F1 score of 30.01% and 14.84%, respectively. My submission based on AraBERT scored a macro F1 average of 22.42% and was ranked 10 out of the 19 teams who participated in the task.

Cite

CITATION STYLE

APA

Al Abdulsalam, A. A. (2022). SQU-CS @ NADI 2022: Dialectal Arabic Identification using One-vs-One Classification with TF-IDF Weights Computed on Character n-grams. In WANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop (pp. 436–441). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.wanlp-1.45

SQU-CS @ NADI 2022: Dialectal Arabic Identification using One-vs-One Classification with TF-IDF Weights Computed on Character n-grams

Abstract

Cite

Register to see more suggestions