Using Kullback-Leibler distance for text categorization

97Citations
Citations of this article
104Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A system that performs text categorization aims to assign appropriate categories from a predefined classification scheme to incoming documents. These assignments might be used for varied purposes such as filtering, or retrieval. This paper introduces a new effective model for text categorization with great corpus (more or less 1 million documents). Text categorization is performed using the Kullback-Leibler distance between the probability distribution of the document to classify and the probability distribution of each category. Using the same representation of categories, experiments show a significant improvement when the above mentioned method is used. KLD method achieve substantial improvements over the tfidf performing method. © Springer-Verlag Berlin Heidelberg 2003.

Cite

CITATION STYLE

APA

Bigi, B. (2003). Using Kullback-Leibler distance for text categorization. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2633, 305–319. https://doi.org/10.1007/3-540-36618-0_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free