Efficient compression technique for sparse sets

10Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent growth in internet has generated large amount of data over web. Representations of most of such data are high-dimensional and sparse. Many fundamental subroutines of various data analytics tasks such as clustering, ranking, nearest neighbour scales poorly with the data dimension. In spite of significant growth in the computational power performing such computations on high dimensional data sets are infeasible, and at times impossible. Thus, it is desirable to investigate on compression algorithms that can significantly reduce dimension while preserving similarity between data objects. In this work, we consider the data points as sets, and use Jaccard similarity as the similarity measure. Pratap and Kulkarni [10] suggested a compression technique for high dimensional, sparse, binary data for preserving the Inner product and Hamming distance. In this work, we show that their algorithm also works well for Jaccard similarity. We present a theoretical analysis of compression bound and complement it with rigorous experimentation on synthetic and real-world datasets. We also compare our results with the state-of-the-art “min-wise independent permutation [6]”, and show that our compression algorithm achieves almost equal accuracy while significantly reducing the compression time and the randomness.

Cite

CITATION STYLE

APA

Pratap, R., Sohony, I., & Kulkarni, R. (2018). Efficient compression technique for sparse sets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10939 LNAI, pp. 164–176). Springer Verlag. https://doi.org/10.1007/978-3-319-93040-4_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free