Classifying with co-stems a new representation for information filtering

Nedim Lipka; Benno Stein

Conference Proceedings

Classifying with co-stems a new representation for information filtering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6611 LNCS 307-311

DOI: 10.1007/978-3-642-20161-5_31

2Citations

6Readers

Get full text

Abstract

Besides the content the writing style is an important discriminator in information filtering tasks. Ideally, the solution of a filtering task employs a text representation that models both kinds of characteristics. In this respect word stems are clearly content capturing, whereas word suffixes qualify as writing style indicators. Though the latter feature type is used for part of speech tagging, it has not yet been employed for information filtering in general. We propose a text representation that combines both the output of a stemming algorithm (stems) and the stem-reduced words (co-stems). A co-stem can be a prefix, an infix, a suffix, or a concatenation of prefixes, infixes, or suffixes. Using accepted standard corpora, we analyze the discriminative power of this representation for a broad range of information filtering tasks to provide new insights into the adequacy and task-specificity of text representation models. Altogether we observe that co-stem-based representations outperform the classical bag of words model for several filtering tasks.

Cite

CITATION STYLE

APA

Lipka, N., & Stein, B. (2011). Classifying with co-stems a new representation for information filtering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6611 LNCS, pp. 307–311). Springer Verlag. https://doi.org/10.1007/978-3-642-20161-5_31

Classifying with co-stems a new representation for information filtering

Abstract

Cite

Register to see more suggestions