Efficient compression technique for sparse sets

Rameshwar Pratap; Ishan Sohony; Raghav Kulkarni

Conference Proceedings

Efficient compression technique for sparse sets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10939 LNAI 164-176

DOI: 10.1007/978-3-319-93040-4_14

10Citations

3Readers

Get full text

Abstract

Recent growth in internet has generated large amount of data over web. Representations of most of such data are high-dimensional and sparse. Many fundamental subroutines of various data analytics tasks such as clustering, ranking, nearest neighbour scales poorly with the data dimension. In spite of significant growth in the computational power performing such computations on high dimensional data sets are infeasible, and at times impossible. Thus, it is desirable to investigate on compression algorithms that can significantly reduce dimension while preserving similarity between data objects. In this work, we consider the data points as sets, and use Jaccard similarity as the similarity measure. Pratap and Kulkarni [10] suggested a compression technique for high dimensional, sparse, binary data for preserving the Inner product and Hamming distance. In this work, we show that their algorithm also works well for Jaccard similarity. We present a theoretical analysis of compression bound and complement it with rigorous experimentation on synthetic and real-world datasets. We also compare our results with the state-of-the-art “min-wise independent permutation [6]”, and show that our compression algorithm achieves almost equal accuracy while significantly reducing the compression time and the randomness.

Cite

CITATION STYLE

APA

Pratap, R., Sohony, I., & Kulkarni, R. (2018). Efficient compression technique for sparse sets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10939 LNAI, pp. 164–176). Springer Verlag. https://doi.org/10.1007/978-3-319-93040-4_14

Efficient compression technique for sparse sets

Abstract

Cite

Register to see more suggestions