Can characters reveal your native language? A language-independent approach to native language identification

Radu Tudor Ionescu; Marius Popescu; Aoife Cahill

Conference ProceedingsOPEN ACCESS

Can characters reveal your native language? A language-independent approach to native language identification

EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2014) 1363-1373

DOI: 10.3115/v1/d14-1142

66Citations

100Readers

Abstract

A common approach in text mining tasks such as text categorization, authorship identification or plagiarism detection is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. In this work, an approach that uses character n-grams as features is proposed for the task of native language identification. Instead of doing standard feature selection, the proposed approach combines several string kernels using multiple kernel learning. Kernel Ridge Regression and Kernel Discriminant Analysis are independently used in the learning stage. The empirical results obtained in all the experiments conducted in this work indicate that the proposed approach achieves state of the art performance in native language identification, reaching an accuracy that is 1.7% above the top scoring system of the 2013 NLI Shared Task. Furthermore, the proposed approach has an important advantage in that it is language independent and linguistic theory neutral. In the cross-corpus experiment, the proposed approach shows that it can also be topic independent, improving the state of the art system by 32.3%.

Cite

CITATION STYLE

APA

Ionescu, R. T., Popescu, M., & Cahill, A. (2014). Can characters reveal your native language? A language-independent approach to native language identification. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 1363–1373). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/d14-1142

Can characters reveal your native language? A language-independent approach to native language identification

Abstract

Cite

Register to see more suggestions