Enhancing dataset processing in hadoop yarn performance for big data applications

Ahmed Abdulhakim Al-Absi; Dae Ki Kang; Myong Jong Kim

Conference Proceedings

Enhancing dataset processing in hadoop yarn performance for big data applications

Lecture Notes in Electrical Engineering (2016) 354 9-15

DOI: 10.1007/978-3-662-47895-0_2

0Citations

5Readers

Get full text

Abstract

In Hadoop MapReduce distributed file system, as the input dataset files get loaded and split to every worker, workers start to do the required computation according to user logic. This process is done in parallel using all nodes in the cluster and computes output results. However, the contention of resources between the map and reduce stages cause significant delays in execution time, especially due to the memory IO overheads. This is undesired because the task execution in the Hadoop MapReduce induces an overhead in considering redundant data in case of imprecise applications which increases the execution time. Thus, in this paper we present our approach to optimize local worker memory management mechanism to reduce the presence of null schedule slots. Efficient utilization of slots leads to reduce execution times. The local memory management mechanism adopted enables efficient parallel execution and reduced memory overheads. The approach effectively reduced the MapReduce computation time which minimizes the budget for application execution in the cloud.

Author supplied keywords

Cite

CITATION STYLE

APA

Al-Absi, A. A., Kang, D. K., & Kim, M. J. (2016). Enhancing dataset processing in hadoop yarn performance for big data applications. In Lecture Notes in Electrical Engineering (Vol. 354, pp. 9–15). Springer Verlag. https://doi.org/10.1007/978-3-662-47895-0_2

Enhancing dataset processing in hadoop yarn performance for big data applications

Abstract

Author supplied keywords

Cite

Register to see more suggestions