Using Kullback-Leibler distance for text categorization

Brigitte Bigi

Journal Article

Using Kullback-Leibler distance for text categorization

Bigi B

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2003) 2633 305-319

DOI: 10.1007/3-540-36618-0_22

97Citations

104Readers

Get full text

Abstract

A system that performs text categorization aims to assign appropriate categories from a predefined classification scheme to incoming documents. These assignments might be used for varied purposes such as filtering, or retrieval. This paper introduces a new effective model for text categorization with great corpus (more or less 1 million documents). Text categorization is performed using the Kullback-Leibler distance between the probability distribution of the document to classify and the probability distribution of each category. Using the same representation of categories, experiments show a significant improvement when the above mentioned method is used. KLD method achieve substantial improvements over the tfidf performing method. © Springer-Verlag Berlin Heidelberg 2003.

Cite

CITATION STYLE

APA

Bigi, B. (2003). Using Kullback-Leibler distance for text categorization. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2633, 305–319. https://doi.org/10.1007/3-540-36618-0_22

Using Kullback-Leibler distance for text categorization

Abstract

Cite

Register to see more suggestions