Accelerating big data processing on modern HPC clusters

Xiaoyi Lu; Md Wasi-ur-Rahman; Nusrat Islam; Dipti Shankar; Dhabaleswar K. Panda

Book Chapter

Accelerating big data processing on modern HPC clusters

Springer International Publishing, (2016), 79-106

DOI: 10.1007/978-3-319-33742-5_5

1Citations

6Readers

Get full text

Abstract

Modern HPC systems and the associated middleware (such as MPI and parallel file systems) have been exploiting the advances in HPC technologies (multi-/many-core architecture, RDMA-enabled networking, and SSD) for many years. However, Big Data processing and management middleware have not fully taken advantage of such technologies. These disparities are taking HPC and Big Data processing into divergent trajectories. This chapter provides an overview of popular Big Data processing middleware, high-performance interconnects and storage architectures, and discusses the challenges in accelerating Big Data processing middleware by leveraging emerging technologies on modern HPC clusters. This chapter presents case studies of advanced designs based on RDMA and heterogeneous storage architecture, that were proposed to address these challenges for multiple components of Hadoop (HDFS and MapReduce) and Spark. The advanced designs presented in the case studies are publicly available as a part of the High-Performance Big Data (HiBD) project. An overview of the HiBD project is also provided in this chapter. All these works aim to bring HPC and Big Data processing into a convergent trajectory.

Cite

CITATION STYLE

APA

Lu, X., Wasi-ur-Rahman, M., Islam, N., Shankar, D., & Panda, D. K. (2016). Accelerating big data processing on modern HPC clusters. In Conquering Big Data with High Performance Computing (pp. 79–106). Springer International Publishing. https://doi.org/10.1007/978-3-319-33742-5_5

Accelerating big data processing on modern HPC clusters

Abstract

Cite

Register to see more suggestions