A regression-based SVD parallelization using overlapping folds for textual data

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the most difficult issues in text mining is high dimensionality caused by a large number of features (keywords). While various multivariate analyses, such as PCA and SVD (in information retrieval, called LSI), are developed to solve this curse of high dimensionality, they are computationally costly. This paper investigates a regression-based reconstruction method that enables parallelization of PCA/SVD by decomposing a document-term matrix into a set of sub-matrices with consideration of overlapped terms, and then to re-assemble using regression technique. To evaluate our method, we utilize two text datasets in the UCI Machine Learning Repository, called “Bag of Words” and “Reuter 50 50”. To measure the closeness between two documents, cosine similarity is applied while the accuracy is measured in the form of rank order mismatch. Finally, the result shows that, the matrices decomposition and re-assembly can preserve the quality of relation/representation.

Cite

CITATION STYLE

APA

Buatoom, U., Theeramunkong, T., & Kongprawechnon, W. (2017). A regression-based SVD parallelization using overlapping folds for textual data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10004 LNAI, pp. 26–37). Springer Verlag. https://doi.org/10.1007/978-3-319-60675-0_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free