G-paradex: GPU-based parallel indexing for fast data deduplication

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deduplication technology has been increasingly used to reduce the storage cost. In practice, the duplicate detection upon large on-disk index incurs unavoidable and significant overheads in write operations. Most existing deduplication methods perform single-pass processing, while pay little attention to develop highly parallel methods for the emerging parallel processors. In this paper, we present the design of G-Paradex, a novel deduplication framework that can significantly reduce the duplicate detecting time. Utilizing a prefix tree to organize the chunk fingerprints, G-Paradex is able to do fast deduplicating by using GPU to search the target tree in parallel. Leveraging the inherent chunk locality in writing data stream, we group consecutive chunks and extract the handprints into the prefix tree, aiming at shrinking the index size and reducing the on-disk accesses. Our experimental evaluation based on real-world datasets demonstrate that, compared with the traditional single-pass method, G-aparadex achieves a speedup of 2-4X for duplicate detecting. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Lin, B., Liao, X., Li, S., Wang, Y., Huang, H., & Wen, L. (2013). G-paradex: GPU-based parallel indexing for fast data deduplication. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8299 LNCS, pp. 91–103). https://doi.org/10.1007/978-3-642-45293-2_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free