Feature analysis for native language identification

10Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this study we investigate the role of different features for the task of native language identification. For this purpose, we compile a learner corpus based on a subset of the EF Cambridge Open Language Database – EFCAMDAT [10] developed at the University of Cambridge in collaboration with EF Education. The features we are taking into consideration include character n-grams, positional token frequencies, part of speech n-grams, function words, shell nouns and a set of annotated errors. Last but not least, we examine whether the essays of English learners that share the same mother tongue can be distinguished based on their country of origin.

Cite

CITATION STYLE

APA

Nisioi, S. (2015). Feature analysis for native language identification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9041, pp. 644–657). Springer Verlag. https://doi.org/10.1007/978-3-319-18111-0_49

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free