This study presents the first Native Language Identification (NLI) study for L2 Portuguese. We used a sub-set of the NLI-PT dataset, containing texts written by speakers of five different native languages: Chinese, English, German, Italian, and Spanish. We explore the linguistic annotations available in NLI-PT to extract a range of (morpho-)syntactic features and apply NLI classification methods to predict the native language of the authors. The best results were obtained using an ensemble combination of the features, achieving 54.1 % accuracy.
CITATION STYLE
Malmasi, S., del Río, I., & Zampieri, M. (2018). Portuguese Native Language Identification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11122 LNAI, pp. 115–124). Springer Verlag. https://doi.org/10.1007/978-3-319-99722-3_12
Mendeley helps you to discover research relevant for your work.