A new semi-supervised dimension reduction technique for textual data analysis

Manuel Martín-Merino; Jesus Román

Conference Proceedings

A new semi-supervised dimension reduction technique for textual data analysis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4224 LNCS 654-662

DOI: 10.1007/11875581_79

1Citations

5Readers

Get full text

Abstract

Dimension reduction techniques are important preprocessing algorithms for high dimensional applications that reduce the noise keeping the main structure of the dataset. They have been successfully applied to a large variety of problems and particularly in text mining applications. However, the algorithms proposed in the literature often suffer from a low discriminant power due to its unsupervised nature and to the 'curse of dimensionality'. Fortunately several search engines such as Yahoo provide a manually created classification of a subset of documents that may be exploited to overcome this problem. In this paper we propose a semi-supervised version of a PCA like algorithm for textual data analysis. The new method reduces the term space dimensionality taking advantage of this document classification. The proposed algorithm has been evaluated using a text mining problem and it outperforms well known unsupervised techniques. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Martín-Merino, M., & Román, J. (2006). A new semi-supervised dimension reduction technique for textual data analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4224 LNCS, pp. 654–662). Springer Verlag. https://doi.org/10.1007/11875581_79

A new semi-supervised dimension reduction technique for textual data analysis

Abstract

Cite

Register to see more suggestions