In Hadoop MapReduce distributed file system, as the input dataset files get loaded and split to every worker, workers start to do the required computation according to user logic. This process is done in parallel using all nodes in the cluster and computes output results. However, the contention of resources between the map and reduce stages cause significant delays in execution time, especially due to the memory IO overheads. This is undesired because the task execution in the Hadoop MapReduce induces an overhead in considering redundant data in case of imprecise applications which increases the execution time. Thus, in this paper we present our approach to optimize local worker memory management mechanism to reduce the presence of null schedule slots. Efficient utilization of slots leads to reduce execution times. The local memory management mechanism adopted enables efficient parallel execution and reduced memory overheads. The approach effectively reduced the MapReduce computation time which minimizes the budget for application execution in the cloud.
CITATION STYLE
Al-Absi, A. A., Kang, D. K., & Kim, M. J. (2016). Enhancing dataset processing in hadoop yarn performance for big data applications. In Lecture Notes in Electrical Engineering (Vol. 354, pp. 9–15). Springer Verlag. https://doi.org/10.1007/978-3-662-47895-0_2
Mendeley helps you to discover research relevant for your work.