Improving mapreduce performance through complexity and performance based data placement in heterogeneous hadoop clusters

18Citations
Citations of this article
38Readers
Mendeley users who have this article in their library.
Get full text

Abstract

MapReduce has emerged as an important programming model with clusters having tens of thousands of nodes. Hadoop, an open source implementation of MapReduce may contain various nodes which are heterogeneous in their computing capacity for various reasons. It is important for the data placement algorithms to partition the input and intermediate data based on the computing capacities of the nodes in the cluster. We propose several enhancements to data placing algorithms in Hadoop such that the load is distributed across the nodes evenly. In this work, we propose two techniques to measure the computing capacities of the nodes. Secondly, we propose improvements to the input data distribution algorithm based on the map and reduce function complexities and the measured heterogeneity of nodes. Finally, we evaluate the improvement of the MapReduce performance. © 2013 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Arasanal, R. M., & Rumani, D. U. (2013). Improving mapreduce performance through complexity and performance based data placement in heterogeneous hadoop clusters. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7753 LNCS, pp. 115–125). Springer Verlag. https://doi.org/10.1007/978-3-642-36071-8_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free