iHDFS: A distributed file system supporting incremental computing

Zhenhua Wang; Qingsong Ding; Fuxiang Gao; Derong Shen; Ge Yu

Conference Proceedings

iHDFS: A distributed file system supporting incremental computing

Communications in Computer and Information Science (2015) 503 151-158

DOI: 10.1007/978-3-662-46248-5_19

0Citations

3Readers

Get full text

Abstract

Big data are always processed repeatedly with small changes, which is a major form of big data processing. The feature of incremental change of big data shows that incremental computing mode can improve the performance greatly. HDFS is a distributed file system on Hadoop which is the most popular platform for big data analytics. And HDFS adopts fixed-size chunking policy, which is inefficient facing incremental computing. Therefore, in this paper, we proposed iHDFS (incremental HDFS), a distributed file system, which can provide basic guarantee for big data parallel processing. The iHDFS is implemented as an extension to HDFS. In iHDFS, Rabin fingerprint algorithm is applied to achieve content defined chunking. This policy make data chunking has much higher stability, and the intermediate processing results can be reused efficiently, so the performance of incremental data processing can be improved significantly. The effectiveness and efficiency of iHDFS have been demonstrated by the experimental results.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Z., Ding, Q., Gao, F., Shen, D., & Yu, G. (2015). iHDFS: A distributed file system supporting incremental computing. In Communications in Computer and Information Science (Vol. 503, pp. 151–158). Springer Verlag. https://doi.org/10.1007/978-3-662-46248-5_19

iHDFS: A distributed file system supporting incremental computing

Abstract

Author supplied keywords

Cite

Register to see more suggestions