Native Language Identification with String Kernels

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This chapter presents an application of machine learning methods that work at the character level. More precisely, several string kernels, one of which is based on Local Rank Distance, are combined to obtain state-of-the-art results for native language identification (NLI). A broad set of NLI experiments are conducted to compare the string kernels approach with other state-of-the-art methods on English, Arabic, and Norwegian corpora. In all the experiments, strings kernels obtain results better than the state of the art, sometimes by a very large margin. For instance, there is a 32.3%32.3% improvement in accuracy over the state-of-the-art system, when the systems based on string kernels are trained on the TOEFL11 corpus and tested on the TOEFL11-Big corpus. The results are even more impressive considering that the proposed approach is language independent and linguistic theory neutral. To gain additional insights about the string kernels approach, the features selected by the classifier as being more discriminant are analyzed in this chapter. The analysis also offers information about localized language transfer effects, since the features used by the proposed model are p -grams of various lengths.

Cite

CITATION STYLE

APA

Ionescu, R. T., & Popescu, M. (2016). Native Language Identification with String Kernels. In Advances in Computer Vision and Pattern Recognition (pp. 193–227). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-30367-3_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free