Locality aware mapreduce

Reema Rhine; Nikhila T. Bhuvan

Conference Proceedings

Locality aware mapreduce

Advances in Intelligent Systems and Computing (2016) 424 221-228

DOI: 10.1007/978-3-319-28031-8_19

3Citations

7Readers

Get full text

Abstract

The large amount of data produced need to be processed properly. This can be done using Apache Hadoop, which is an open source software library. HDFS and MapReduce are the two core components of Apache Hadoop. The overall performance can be increased by improving the performance of MapReduce. A Locality Aware MapReduce idea is introduced here. It includes an input splitting strategy and also a MapReduce scheduling algorithm. Both of them are based on the locality of data. The input splitting, called the Improved Input Splitting, which works based on locality, clusters data blocks from a same node into the same single split, so that it is processed by one map task. In the scheduling algorithm, to assign tasks to a node, local map tasks are always preferred over non-local map tasks, no matter which job a task belongs to. That is, here the algorithm performs scheduling by checking for a local data when a free slot is available. Non-local data is always given a second preference. Since the scheduling is done based on locality it is called Locality Aware Scheduling. Each of these methods, when executed separately and combined showed a better performance than the one without any modification.

Author supplied keywords

Cite

CITATION STYLE

APA

Rhine, R., & Bhuvan, N. T. (2016). Locality aware mapreduce. In Advances in Intelligent Systems and Computing (Vol. 424, pp. 221–228). Springer Verlag. https://doi.org/10.1007/978-3-319-28031-8_19

Locality aware mapreduce

Abstract

Author supplied keywords

Cite

Register to see more suggestions