Fast Supervised Dimensionality Reduction Algorithm with Applications to Document Categorization & Retrieval

68Citations
Citations of this article
46Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Retriev al techniques based on dimensionaljfc reduction, such as Latent Semantic Indexing (LSI), have been shown to improve the quality of the information being retrieved by capturing the latent meaning of the words present in the documents. Unfortunately, the high computational and memory requirements of LSI and its inabilit yto compute an effective dimensionality reduction in a supervised setting limits its applicability. In this paper we present a fast supervised dimensionality reduction algorithm that is derived from the recently dev eloped cluster-based unsupervised dimensionality reduction algorithms. We experimentally evaluate the quality of the low er dimensional spaces both in the coitext of document categorization and improvements in retrieval performance on a variety of different document collections. Our experiments sho w that the lower dimensional spaces computed by our algorithm consistently improve the performance of traditional algorithms such as C4.5, fc-nearestneigh bor, and Support Vector Machines (SVM), by an average of 2% to 7%. Furthermore, the supervised lover dimensional space greatly improves the retriev al performance when compared to LSI.

Cite

CITATION STYLE

APA

Karypis, G., & Han, E. H. (2000). Fast Supervised Dimensionality Reduction Algorithm with Applications to Document Categorization & Retrieval. In International Conference on Information and Knowledge Management, Proceedings (Vol. 2000-January, pp. 12–19). Association for Computing Machinery. https://doi.org/10.1145/354756.354772

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free