Native Language Identification with String Kernels

Radu Tudor Ionescu; Marius Popescu

Book Chapter

Native Language Identification with String Kernels

Springer Science and Business Media Deutschland GmbH, (2016), 193-227

DOI: 10.1007/978-3-319-30367-3_8

2Citations

2Readers

Get full text

Abstract

This chapter presents an application of machine learning methods that work at the character level. More precisely, several string kernels, one of which is based on Local Rank Distance, are combined to obtain state-of-the-art results for native language identification (NLI). A broad set of NLI experiments are conducted to compare the string kernels approach with other state-of-the-art methods on English, Arabic, and Norwegian corpora. In all the experiments, strings kernels obtain results better than the state of the art, sometimes by a very large margin. For instance, there is a 32.3%32.3% improvement in accuracy over the state-of-the-art system, when the systems based on string kernels are trained on the TOEFL11 corpus and tested on the TOEFL11-Big corpus. The results are even more impressive considering that the proposed approach is language independent and linguistic theory neutral. To gain additional insights about the string kernels approach, the features selected by the classifier as being more discriminant are analyzed in this chapter. The analysis also offers information about localized language transfer effects, since the features used by the proposed model are p -grams of various lengths.

Author supplied keywords

Cite

CITATION STYLE

APA

Ionescu, R. T., & Popescu, M. (2016). Native Language Identification with String Kernels. In Advances in Computer Vision and Pattern Recognition (pp. 193–227). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-30367-3_8

Native Language Identification with String Kernels

Abstract

Author supplied keywords

Cite

Register to see more suggestions