Chunk-based deduplication has been widely used in storage systems to save storage space. However, deduplication impairs data reliability due to the inter-file chunk sharing. The loss of shared chunks will make these referenced files inaccessible. Meanwhile, we find that inter-file and highly-referenced chunks are important that need higher reliability assurance, but occupy a small fraction of physical storage. Traditional deduplication systems utilize erasure coding or replication techniques to ensure data reliability. With the growth of shared chunks, promoting the reliability of erasure-coded systems incurs large I/O cost because of the weakness of coding scalability. Although replication is easy to scale, it incurs larger storage overhead. In this paper, we present DARM, a Deduplication-Aware Redundancy Management approach via exploiting deduplication semantics (e.g., inter-/intra-file duplicates, chunk size and reference count) to improve data reliability with low overhead. DARM leverages erasure coding for storing unique and low-referenced chunks to improve both storage reliability and space efficiency, and employs Selective and Dynamic Chunk-based Replication (SDCR) for maintaining inter-file and highly-referenced chunks to enhance storage reliability. Experimental results based on real-world datasets show that DARM reduces storage overhead by up to 43.4% and achieves at most 12.7% reliability improvements over the state-of-the-art schemes.
CITATION STYLE
Zhou, Y., Feng, D., Xia, W., Fu, M., & Xiao, Y. (2018). DARM: A deduplication-aware redundancy management approach for reliable-enhanced storage systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11335 LNCS, pp. 445–461). Springer Verlag. https://doi.org/10.1007/978-3-030-05054-2_35
Mendeley helps you to discover research relevant for your work.