The data generated and processed by modern computing systems burgeon rapidly. MapReduce is an important programming model for large scale data intensive applications. Hadoop is a popular open source implementation of MapReduce and Google File System (GFS). The scalability and fault-tolerance feature of Hadoop makes it as a standard for BigData processing. Hadoop uses Hadoop Distributed File System (HDFS) for storing data. Data reliability and fault-tolerance is achieved through replication in HDFS. In this paper, a new technique called Delay Scheduling Based Replication Algorithm (DSBRA) is proposed to identify and replicate (dereplicate) the popular (unpopular) files/blocks in HDFS based on the information collected from the scheduler. Experimental results show that, the proposed method achieves 13% and 7% improvements in response time and locality over existing algorithms respectively.
CITATION STYLE
Suresh, S., & Gopalan, N. P. (2015). Delay Scheduling Based Replication Scheme for Hadoop Distributed File System. International Journal of Information Technology and Computer Science, 7(4), 73–78. https://doi.org/10.5815/ijitcs.2015.04.08
Mendeley helps you to discover research relevant for your work.