This paper presents a two-phase synchronization algorithm- tpsync, which combines content-defined chunking (CDC) with sliding block duplicated data detection methods. tpsync firstly partitions synchronized files into variable-sized chunks in coarse-grained scale with CDC method, locates the unmatched chunks of synchronized files using the edit distance algorithm, and finally generates the fine-grained delta data with fixed-sized sliding block duplicated data detection method. At the first-phase, tpsync can quickly locate the partial changed chunks between two files through similar files' fingerprint characteristics. On the basis of the first phase's results, small fixed-sized sliding block duplicated data detection method can produce better fine-grained delta data between the corresponding unmatched data chunks further. Extensive experiments on ASCII, binary and database files demonstrate that tpsync can achieve a higher performance on synchronization time and total transferred data compared to traditional fixed-sized sliding block method-rsync. Compared to rsync, tpsync reduces synchronization time by 12% and bandwidth by 18.9% on average if optimized parameters are applied on both. With signature cached synchronization method adopted, tpsync can yield a better performance. © Springer-Verlag Berlin Heidelberg 2010.
CITATION STYLE
Sheng, Y., Xu, D., & Wang, D. (2010). A two-phase differential synchronization algorithm for remote files. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6081 LNCS, pp. 65–78). https://doi.org/10.1007/978-3-642-13119-6_6
Mendeley helps you to discover research relevant for your work.