In this paper we present the first application of Native Language Identification (NLI) to Arabic learner data. NLI, the task of predicting a writer’s first language from their writing in other languages has been mostly investigated with English data, but is now expanding to other languages. We use L2 texts from the newly released Arabic Learner Corpus and with a combination of three syntactic features (CFG production rules, Arabic function words and Part-of-Speech n-grams), we demonstrate that they are useful for this task. Our system achieves an accuracy of 41% against a baseline of 23%, providing the first evidence for classifier-based detection of language transfer effects in L2 Arabic. Such methods can be useful for studying language transfer, developing teaching materials tailored to students’ native language and forensic linguistics. Future directions are discussed.
CITATION STYLE
Malmasi, S., & Dras, M. (2014). Arabic Native Language Identification. In ANLP 2014 - EMNLP 2014 Workshop on Arabic Natural Language Processing, Proceedings (pp. 180–186). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3625
Mendeley helps you to discover research relevant for your work.