Arabic dialect identification is an inherently complex problem, as Arabic dialect taxonomy is convoluted and aims to dissect a continuous space rather than a discrete one. In this work, we present machine and deep learning approaches to predict 21 fine-grained dialects form a set of given tweets per user. We adopted numerous feature extraction methods most of which showed improvement in the final model, such as word embedding, Tf-idf, and other tweet features. Our results show that a simple LinearSVC can outperform any complex deep learning model given a set of curated features. With a relatively complex user voting mechanism, we were able to achieve a Macro- Averaged F1-score of 71.84% on MADAR shared subtask-2. Our best submitted model ranked second out of all participating teams.
CITATION STYLE
Talafha, B., Farhan, W., Altakrouri, A., & Al-Natsheh, H. T. (2019). Mawdoo3 ai at madar shared task: Arabic tweet dialect identification. In ACL 2019 - 4th Arabic Natural Language Processing Workshop, WANLP 2019 - Proceedings of the Workshop (pp. 239–243). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w19-4629
Mendeley helps you to discover research relevant for your work.