Building space-efficient inverted indexes on low-cardinality dimensions

Vasilis Spyropoulos; Yannis Kotidis

Conference Proceedings

Building space-efficient inverted indexes on low-cardinality dimensions

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9261 449-459

DOI: 10.1007/978-3-319-22849-5_30

1Citations

4Readers

Get full text

Abstract

Many modern applications naturally lead to the implementation of inverted indexes for effectively managing large collections of data items. Creating an inverted index on a low cardinality data domain results in replication of data descriptors, leading to increased storage overhead. For example, the use of RFID or similar sensing devices in supply-chains results in massive tracking datasets that need effective spatial or spatio-temporal indexes on them. As the volume of data grows proportionally larger than the number of spatial locations or time epochs, it is unavoidable that many of the resulting lists share large subsets of common items. In this paper we present techniques that exploit this characteristic of modern big-data applications in order to losslessly compress the resulting inverted indexes by discovering large common item sets and adapting the index so as to store just one copy of them. We apply our method in the supply chain domain using modern big-data tools and show that our techniques in many cases achieve compression ratios that exceed 50%.

Cite

CITATION STYLE

APA

Spyropoulos, V., & Kotidis, Y. (2015). Building space-efficient inverted indexes on low-cardinality dimensions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9261, pp. 449–459). Springer Verlag. https://doi.org/10.1007/978-3-319-22849-5_30

Building space-efficient inverted indexes on low-cardinality dimensions

Abstract

Cite

Register to see more suggestions