Abstract
Retriev al techniques based on dimensionaljfc reduction, such as Latent Semantic Indexing (LSI), have been shown to improve the quality of the information being retrieved by capturing the latent meaning of the words present in the documents. Unfortunately, the high computational and memory requirements of LSI and its inabilit yto compute an effective dimensionality reduction in a supervised setting limits its applicability. In this paper we present a fast supervised dimensionality reduction algorithm that is derived from the recently dev eloped cluster-based unsupervised dimensionality reduction algorithms. We experimentally evaluate the quality of the low er dimensional spaces both in the coitext of document categorization and improvements in retrieval performance on a variety of different document collections. Our experiments sho w that the lower dimensional spaces computed by our algorithm consistently improve the performance of traditional algorithms such as C4.5, fc-nearestneigh bor, and Support Vector Machines (SVM), by an average of 2% to 7%. Furthermore, the supervised lover dimensional space greatly improves the retriev al performance when compared to LSI.
Cite
CITATION STYLE
Karypis, G., & Han, E. H. (2000). Fast Supervised Dimensionality Reduction Algorithm with Applications to Document Categorization & Retrieval. In International Conference on Information and Knowledge Management, Proceedings (Vol. 2000-January, pp. 12–19). Association for Computing Machinery. https://doi.org/10.1145/354756.354772
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.