A regression-based SVD parallelization using overlapping folds for textual data

Uraiwan Buatoom; Thanaruk Theeramunkong; Waree Kongprawechnon

Conference Proceedings

A regression-based SVD parallelization using overlapping folds for textual data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10004 LNAI 26-37

DOI: 10.1007/978-3-319-60675-0_3

0Citations

1Readers

Get full text

Abstract

One of the most difficult issues in text mining is high dimensionality caused by a large number of features (keywords). While various multivariate analyses, such as PCA and SVD (in information retrieval, called LSI), are developed to solve this curse of high dimensionality, they are computationally costly. This paper investigates a regression-based reconstruction method that enables parallelization of PCA/SVD by decomposing a document-term matrix into a set of sub-matrices with consideration of overlapped terms, and then to re-assemble using regression technique. To evaluate our method, we utilize two text datasets in the UCI Machine Learning Repository, called “Bag of Words” and “Reuter 50 50”. To measure the closeness between two documents, cosine similarity is applied while the accuracy is measured in the form of rank order mismatch. Finally, the result shows that, the matrices decomposition and re-assembly can preserve the quality of relation/representation.

Author supplied keywords

Cite

CITATION STYLE

APA

Buatoom, U., Theeramunkong, T., & Kongprawechnon, W. (2017). A regression-based SVD parallelization using overlapping folds for textual data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10004 LNAI, pp. 26–37). Springer Verlag. https://doi.org/10.1007/978-3-319-60675-0_3

A regression-based SVD parallelization using overlapping folds for textual data

Abstract

Author supplied keywords

Cite

Register to see more suggestions