G-paradex: GPU-based parallel indexing for fast data deduplication

Bin Lin; Xiangke Liao; Shanshan Li; Yufeng Wang; He Huang; Ling Wen

Conference Proceedings

G-paradex: GPU-based parallel indexing for fast data deduplication

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 8299 LNCS 91-103

DOI: 10.1007/978-3-642-45293-2_7

1Citations

6Readers

Get full text

Abstract

Deduplication technology has been increasingly used to reduce the storage cost. In practice, the duplicate detection upon large on-disk index incurs unavoidable and significant overheads in write operations. Most existing deduplication methods perform single-pass processing, while pay little attention to develop highly parallel methods for the emerging parallel processors. In this paper, we present the design of G-Paradex, a novel deduplication framework that can significantly reduce the duplicate detecting time. Utilizing a prefix tree to organize the chunk fingerprints, G-Paradex is able to do fast deduplicating by using GPU to search the target tree in parallel. Leveraging the inherent chunk locality in writing data stream, we group consecutive chunks and extract the handprints into the prefix tree, aiming at shrinking the index size and reducing the on-disk accesses. Our experimental evaluation based on real-world datasets demonstrate that, compared with the traditional single-pass method, G-aparadex achieves a speedup of 2-4X for duplicate detecting. © 2013 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Lin, B., Liao, X., Li, S., Wang, Y., Huang, H., & Wen, L. (2013). G-paradex: GPU-based parallel indexing for fast data deduplication. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8299 LNCS, pp. 91–103). https://doi.org/10.1007/978-3-642-45293-2_7

G-paradex: GPU-based parallel indexing for fast data deduplication

Abstract

Author supplied keywords

Cite

Register to see more suggestions