Mawdoo3 ai at madar shared task: Arabic tweet dialect identification

8Citations
Citations of this article
68Readers
Mendeley users who have this article in their library.

Abstract

Arabic dialect identification is an inherently complex problem, as Arabic dialect taxonomy is convoluted and aims to dissect a continuous space rather than a discrete one. In this work, we present machine and deep learning approaches to predict 21 fine-grained dialects form a set of given tweets per user. We adopted numerous feature extraction methods most of which showed improvement in the final model, such as word embedding, Tf-idf, and other tweet features. Our results show that a simple LinearSVC can outperform any complex deep learning model given a set of curated features. With a relatively complex user voting mechanism, we were able to achieve a Macro- Averaged F1-score of 71.84% on MADAR shared subtask-2. Our best submitted model ranked second out of all participating teams.

Cite

CITATION STYLE

APA

Talafha, B., Farhan, W., Altakrouri, A., & Al-Natsheh, H. T. (2019). Mawdoo3 ai at madar shared task: Arabic tweet dialect identification. In ACL 2019 - 4th Arabic Natural Language Processing Workshop, WANLP 2019 - Proceedings of the Workshop (pp. 239–243). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w19-4629

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free