Feature analysis for native language identification

Sergiu Nisioi

Conference Proceedings

Feature analysis for native language identification

Nisioi S

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9041 644-657

DOI: 10.1007/978-3-319-18111-0_49

10Citations

16Readers

Get full text

Abstract

In this study we investigate the role of different features for the task of native language identification. For this purpose, we compile a learner corpus based on a subset of the EF Cambridge Open Language Database – EFCAMDAT [10] developed at the University of Cambridge in collaboration with EF Education. The features we are taking into consideration include character n-grams, positional token frequencies, part of speech n-grams, function words, shell nouns and a set of annotated errors. Last but not least, we examine whether the essays of English learners that share the same mother tongue can be distinguished based on their country of origin.

Cite

CITATION STYLE

APA

Nisioi, S. (2015). Feature analysis for native language identification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9041, pp. 644–657). Springer Verlag. https://doi.org/10.1007/978-3-319-18111-0_49

Feature analysis for native language identification

Abstract

Cite

Register to see more suggestions