Brushing—an algorithm for data deduplication

Prasun Dutta; Pratik Pattnaik; Rajesh Kumar Sahu

Conference Proceedings

Brushing—an algorithm for data deduplication

Advances in Intelligent Systems and Computing (2016) 433 227-234

DOI: 10.1007/978-81-322-2755-7_23

0Citations

3Readers

Get full text

Abstract

Deduplication is mainly used to solve the problem of space and is known as a space-efficient technique. A two step algorithm called ‘brushing’ has been proposed in this paper to solve individual file deduplication. The main aim of the algorithm is to overcome the space related problem, at the same time the algorithm also takes care of time complexity problem. The proposed algorithm has extremely low RAM overhead. The first phase of the algorithm checks the similar entities and removes them thus grouping only unique entities and in the second phase while the unique file is hashed, the unique entities are represented as index values thereby reducing the size of the file to a great extent. Test results shows that if a file contains 40–50 % duplicate data, then this technique reduces the size up to 2/3 of the file. This algorithm has a high deduplication throughput on the file system.

Author supplied keywords

Cite

CITATION STYLE

APA

Dutta, P., Pattnaik, P., & Sahu, R. K. (2016). Brushing—an algorithm for data deduplication. In Advances in Intelligent Systems and Computing (Vol. 433, pp. 227–234). Springer Verlag. https://doi.org/10.1007/978-81-322-2755-7_23

Brushing—an algorithm for data deduplication

Abstract

Author supplied keywords

Cite

Register to see more suggestions