Brushing—an algorithm for data deduplication

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deduplication is mainly used to solve the problem of space and is known as a space-efficient technique. A two step algorithm called ‘brushing’ has been proposed in this paper to solve individual file deduplication. The main aim of the algorithm is to overcome the space related problem, at the same time the algorithm also takes care of time complexity problem. The proposed algorithm has extremely low RAM overhead. The first phase of the algorithm checks the similar entities and removes them thus grouping only unique entities and in the second phase while the unique file is hashed, the unique entities are represented as index values thereby reducing the size of the file to a great extent. Test results shows that if a file contains 40–50 % duplicate data, then this technique reduces the size up to 2/3 of the file. This algorithm has a high deduplication throughput on the file system.

Cite

CITATION STYLE

APA

Dutta, P., Pattnaik, P., & Sahu, R. K. (2016). Brushing—an algorithm for data deduplication. In Advances in Intelligent Systems and Computing (Vol. 433, pp. 227–234). Springer Verlag. https://doi.org/10.1007/978-81-322-2755-7_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free