Compressed vector set: A fast and space-efficient data mining framework

Masafumi Oyamada; Jianquan Liu; Shinji Ito; Kazuyo Narita; Takuya Araki; Hiroyuki Kitagawa

Journal ArticleOPEN ACCESS

Compressed vector set: A fast and space-efficient data mining framework

Journal of Information Processing (2018) 26 416-426

DOI: 10.2197/ipsjjip.26.416

1Citations

6Readers

Abstract

In this paper, we present CVS (Compressed Vector Set), a fast and space-efficient data mining framework that efficiently handles both sparse and dense datasets. CVS holds a set of vectors in a compressed format and conducts primitive vector operations, such as lp-norm and dot product, without decompression. By combining these primitive operations, CVS accelerates prominent data mining or machine learning algorithms including k-nearest neighbor algorithm, stochastic gradient descent algorithm on logistic regression, and kernel methods. In contrast to the commonly used sparse matrix/vector representation, which is not effective for dense datasets, CVS efficiently handles sparse datasets and dense datasets in a unified manner. Our experimental results demonstrate that CVS can process both dense datasets and sparse datasets faster than conventional sparse vector representation with smaller memory usage.

Author supplied keywords

Cite

CITATION STYLE

APA

Oyamada, M., Liu, J., Ito, S., Narita, K., Araki, T., & Kitagawa, H. (2018). Compressed vector set: A fast and space-efficient data mining framework. Journal of Information Processing, 26, 416–426. https://doi.org/10.2197/ipsjjip.26.416

Compressed vector set: A fast and space-efficient data mining framework

Abstract

Author supplied keywords

Cite

Register to see more suggestions