iHDFS: A distributed file system supporting incremental computing

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Big data are always processed repeatedly with small changes, which is a major form of big data processing. The feature of incremental change of big data shows that incremental computing mode can improve the performance greatly. HDFS is a distributed file system on Hadoop which is the most popular platform for big data analytics. And HDFS adopts fixed-size chunking policy, which is inefficient facing incremental computing. Therefore, in this paper, we proposed iHDFS (incremental HDFS), a distributed file system, which can provide basic guarantee for big data parallel processing. The iHDFS is implemented as an extension to HDFS. In iHDFS, Rabin fingerprint algorithm is applied to achieve content defined chunking. This policy make data chunking has much higher stability, and the intermediate processing results can be reused efficiently, so the performance of incremental data processing can be improved significantly. The effectiveness and efficiency of iHDFS have been demonstrated by the experimental results.

Cite

CITATION STYLE

APA

Wang, Z., Ding, Q., Gao, F., Shen, D., & Yu, G. (2015). iHDFS: A distributed file system supporting incremental computing. In Communications in Computer and Information Science (Vol. 503, pp. 151–158). Springer Verlag. https://doi.org/10.1007/978-3-662-46248-5_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free