In this paper, we present CVS (Compressed Vector Set), a fast and space-efficient data mining framework that efficiently handles both sparse and dense datasets. CVS holds a set of vectors in a compressed format and conducts primitive vector operations, such as lp-norm and dot product, without decompression. By combining these primitive operations, CVS accelerates prominent data mining or machine learning algorithms including k-nearest neighbor algorithm, stochastic gradient descent algorithm on logistic regression, and kernel methods. In contrast to the commonly used sparse matrix/vector representation, which is not effective for dense datasets, CVS efficiently handles sparse datasets and dense datasets in a unified manner. Our experimental results demonstrate that CVS can process both dense datasets and sparse datasets faster than conventional sparse vector representation with smaller memory usage.
CITATION STYLE
Oyamada, M., Liu, J., Ito, S., Narita, K., Araki, T., & Kitagawa, H. (2018). Compressed vector set: A fast and space-efficient data mining framework. Journal of Information Processing, 26, 416–426. https://doi.org/10.2197/ipsjjip.26.416
Mendeley helps you to discover research relevant for your work.