Language Identification is an NLP task which aims at predicting the language of a given text. For the Arabic dialects many attempts have been done to address this topic. In this paper, we present our approach to build a Language Identification system in order to distinguish between Moroccan Colloquial Arabic and Arabic languages using two different methods. The first is rule-based and relies on stop word frequency, while the second is statically-based and uses several machine learning classifiers. Obtained results show that the statistical approach outperforms the rule-based approach. Furthermore, the Support Vector Machines classifier is more accurate than other statistical classifiers. Our goal in this paper is to pave the way toward building advanced Moroccan dialect NLP tools such as morphological analyzer and machine translation system.
CITATION STYLE
Tachicart, R., Bouzoubaa, K., Aouragh, S. L., & Jaafa, H. (2018). Automatic identification of moroccan colloquial arabic. In Communications in Computer and Information Science (Vol. 782, pp. 201–214). Springer Verlag. https://doi.org/10.1007/978-3-319-73500-9_15
Mendeley helps you to discover research relevant for your work.