Term-length normalization for centroid-based text categorization

4Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Centroid-based categorization is one of the most popular algorithms in text classification. Normalization is an important factor to improve performance of a centroid-based classifier when documents in text collection have quite different sizes. In the past, normalization involved with only document- or class-length normalization. In this paper, we propose a new type of normalization called term-length normalization which considers term distribution in a class. The performance of this normalization is investigated in three environments of a standard centroid-based classifier (TFIDF): (1) without class-length normalization, (2) with cosine class-length normalization and (3) with summing weight normalization. The results suggest that our term-length normalization is useful for improving classification accuracy in all cases.

Cite

CITATION STYLE

APA

Lertnattee, V., & Theeramunkong, T. (2003). Term-length normalization for centroid-based text categorization. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2773 PART 1, pp. 850–856). Springer Verlag. https://doi.org/10.1007/978-3-540-45224-9_113

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free