Discriminating between similar languages using weighted subword features

9Citations
Citations of this article
57Readers
Mendeley users who have this article in their library.

Abstract

The present contribution revolves around a contrastive subword n-gram model which has been tested in the Discriminating between Similar Languages shared task. I present and discuss the method used in this 14-way language identification task comprising varieties of 6 main language groups. It features the following characteristics: (1) the preprocessing and conversion of a collection of documents to sparse features; (2) weighted character ngram profiles; (3) a multinomial Bayesian classifier. Meaningful bag-of-n-grams features can be used as a system in a straightforward way, my approach outperforms most of the systems used in the DSL shared task (3rd rank).

Cite

CITATION STYLE

APA

Barbaresi, A. (2017). Discriminating between similar languages using weighted subword features. In VarDial 2017 - 4th Workshop on NLP for Similar Languages, Varieties and Dialects, Proceedings (pp. 184–189). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-1223

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free