Hadoop Mapreduce Performance Enhancement Using In-Node Combiners

Woo-Hyun Lee; Hee-Gook Jun; Hyoung-Joo Kim

Journal ArticleOPEN ACCESS

Hadoop Mapreduce Performance Enhancement Using In-Node Combiners

Lee W
Jun H
Kim H

International Journal of Computer Science and Information Technology (2015) 7(5) 1-17

DOI: 10.5121/ijcsit.2015.7501

N/ACitations

14Readers

Abstract

While advanced analysis of large dataset is in high demand, data sizes have surpassed capabilities of conventional software and hardware. Hadoop framework distributes large datasets over multiple commodity servers and performs parallel computations. We discuss the I/O bottlenecks of Hadoop framework and propose methods for enhancing I/O performance. A proven approach is to cache data to maximize memory-locality of all map tasks. We introduce an approach to optimize I/O, the in-node combining design which extends the traditional combiner to a node level. The in-node combiner reduces the total number of intermediate results and curtail network traffic between mappers and reducers.

Cite

CITATION STYLE

APA

Lee, W.-H., Jun, H.-G., & Kim, H.-J. (2015). Hadoop Mapreduce Performance Enhancement Using In-Node Combiners. International Journal of Computer Science and Information Technology, 7(5), 1–17. https://doi.org/10.5121/ijcsit.2015.7501

Hadoop Mapreduce Performance Enhancement Using In-Node Combiners

Abstract

Cite

Register to see more suggestions