Hadoop is a successful open-source implementation of MapReduce programming model. It has been widely adopted by many leading industry companies for big data analytics. However, its intermediate data shuffling is a time-consuming operation that impacts the total execution time of MapReduce programs. Recently, a growing number of organizations are interested in addressing this issue by leveraging the high-performance interconnects, such as InfiniBand and 10 Gigabit Ethernet, which have been popular in High-Performance Computing (HPC) Community. There is a lack of comprehensive examination of the performance impact of these interconnects on MapReduce programs. In this work, we systematically evaluate the performance impact of two popular high-speed interconnects, 10 Gigabit Ethernet and InfiniBand, using the original Apache Hadoop and our extended Hadoop Acceleration framework. Our analysis shows that, under the Apache Hadoop, although using fast networks can efficiently accelerate the jobs with small intermediate data sizes, it is unable to maintain such advantages for jobs with large intermediate data. In contrast, Hadoop Acceleration provides better performance for jobs of a wide range of data sizes. In addition, both implementations exhibit good scalability under different networks. Hadoop Acceleration significantly reduces CPU utilization and I/O wait time of MapReduce programs. © 2014 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Wang, Y., Jiao, Y., Xu, C., Li, X., Wang, T., Que, X., … Yu, W. (2014). Assessing the performance impact of high-speed interconnects on MapReduce. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8163 LNCS, pp. 148–163). Springer Verlag. https://doi.org/10.1007/978-3-642-53974-9_13
Mendeley helps you to discover research relevant for your work.