Deduplication is mainly used to solve the problem of space and is known as a space-efficient technique. A two step algorithm called ‘brushing’ has been proposed in this paper to solve individual file deduplication. The main aim of the algorithm is to overcome the space related problem, at the same time the algorithm also takes care of time complexity problem. The proposed algorithm has extremely low RAM overhead. The first phase of the algorithm checks the similar entities and removes them thus grouping only unique entities and in the second phase while the unique file is hashed, the unique entities are represented as index values thereby reducing the size of the file to a great extent. Test results shows that if a file contains 40–50 % duplicate data, then this technique reduces the size up to 2/3 of the file. This algorithm has a high deduplication throughput on the file system.
CITATION STYLE
Dutta, P., Pattnaik, P., & Sahu, R. K. (2016). Brushing—an algorithm for data deduplication. In Advances in Intelligent Systems and Computing (Vol. 433, pp. 227–234). Springer Verlag. https://doi.org/10.1007/978-81-322-2755-7_23
Mendeley helps you to discover research relevant for your work.