Automatic identification of moroccan colloquial arabic

Ridouane Tachicart; Karim Bouzoubaa; Si Lhoussaine Aouragh; Hamid Jaafa

Conference Proceedings

Automatic identification of moroccan colloquial arabic

Communications in Computer and Information Science (2018) 782 201-214

DOI: 10.1007/978-3-319-73500-9_15

10Citations

19Readers

Get full text

Abstract

Language Identification is an NLP task which aims at predicting the language of a given text. For the Arabic dialects many attempts have been done to address this topic. In this paper, we present our approach to build a Language Identification system in order to distinguish between Moroccan Colloquial Arabic and Arabic languages using two different methods. The first is rule-based and relies on stop word frequency, while the second is statically-based and uses several machine learning classifiers. Obtained results show that the statistical approach outperforms the rule-based approach. Furthermore, the Support Vector Machines classifier is more accurate than other statistical classifiers. Our goal in this paper is to pave the way toward building advanced Moroccan dialect NLP tools such as morphological analyzer and machine translation system.

Author supplied keywords

Cite

CITATION STYLE

APA

Tachicart, R., Bouzoubaa, K., Aouragh, S. L., & Jaafa, H. (2018). Automatic identification of moroccan colloquial arabic. In Communications in Computer and Information Science (Vol. 782, pp. 201–214). Springer Verlag. https://doi.org/10.1007/978-3-319-73500-9_15

Automatic identification of moroccan colloquial arabic

Abstract

Author supplied keywords

Cite

Register to see more suggestions