Enhancing dataset processing in hadoop yarn performance for big data applications

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In Hadoop MapReduce distributed file system, as the input dataset files get loaded and split to every worker, workers start to do the required computation according to user logic. This process is done in parallel using all nodes in the cluster and computes output results. However, the contention of resources between the map and reduce stages cause significant delays in execution time, especially due to the memory IO overheads. This is undesired because the task execution in the Hadoop MapReduce induces an overhead in considering redundant data in case of imprecise applications which increases the execution time. Thus, in this paper we present our approach to optimize local worker memory management mechanism to reduce the presence of null schedule slots. Efficient utilization of slots leads to reduce execution times. The local memory management mechanism adopted enables efficient parallel execution and reduced memory overheads. The approach effectively reduced the MapReduce computation time which minimizes the budget for application execution in the cloud.

Cite

CITATION STYLE

APA

Al-Absi, A. A., Kang, D. K., & Kim, M. J. (2016). Enhancing dataset processing in hadoop yarn performance for big data applications. In Lecture Notes in Electrical Engineering (Vol. 354, pp. 9–15). Springer Verlag. https://doi.org/10.1007/978-3-662-47895-0_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free