Are n-gram categories helpful in text classification?

17Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Character n-grams are widely used in text categorization problems and are the single most successful type of feature in authorship attribution. Their primary advantage is language independence, as they can be applied to a new language with no additional effort. Typed character n-grams reflect information about their content and context. According to previous research, typed character n-grams improve the accuracy of authorship attribution. This paper examines their effectiveness in three domains: authorship attribution, author profiling and sentiment analysis. The problem of a very high number of features is tackled with distributed Apache Spark processing.

Cite

CITATION STYLE

APA

Kruczek, J., Kruczek, P., & Kuta, M. (2020). Are n-gram categories helpful in text classification? In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12138 LNCS, pp. 524–537). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-50417-5_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free