Arabic Native Language Identification

Shervin Malmasi; Mark Dras

Conference ProceedingsOPEN ACCESS

Arabic Native Language Identification

ANLP 2014 - EMNLP 2014 Workshop on Arabic Natural Language Processing, Proceedings (2014) 180-186

DOI: 10.3115/v1/w14-3625

31Citations

104Readers

Abstract

In this paper we present the first application of Native Language Identification (NLI) to Arabic learner data. NLI, the task of predicting a writer’s first language from their writing in other languages has been mostly investigated with English data, but is now expanding to other languages. We use L2 texts from the newly released Arabic Learner Corpus and with a combination of three syntactic features (CFG production rules, Arabic function words and Part-of-Speech n-grams), we demonstrate that they are useful for this task. Our system achieves an accuracy of 41% against a baseline of 23%, providing the first evidence for classifier-based detection of language transfer effects in L2 Arabic. Such methods can be useful for studying language transfer, developing teaching materials tailored to students’ native language and forensic linguistics. Future directions are discussed.

Cite

CITATION STYLE

APA

Malmasi, S., & Dras, M. (2014). Arabic Native Language Identification. In ANLP 2014 - EMNLP 2014 Workshop on Arabic Natural Language Processing, Proceedings (pp. 180–186). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3625

Arabic Native Language Identification

Abstract

Cite

Register to see more suggestions