A system that performs text categorization aims to assign appropriate categories from a predefined classification scheme to incoming documents. These assignments might be used for varied purposes such as filtering, or retrieval. This paper introduces a new effective model for text categorization with great corpus (more or less 1 million documents). Text categorization is performed using the Kullback-Leibler distance between the probability distribution of the document to classify and the probability distribution of each category. Using the same representation of categories, experiments show a significant improvement when the above mentioned method is used. KLD method achieve substantial improvements over the tfidf performing method. © Springer-Verlag Berlin Heidelberg 2003.
CITATION STYLE
Bigi, B. (2003). Using Kullback-Leibler distance for text categorization. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2633, 305–319. https://doi.org/10.1007/3-540-36618-0_22
Mendeley helps you to discover research relevant for your work.