Accelerating big data processing on modern HPC clusters

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Modern HPC systems and the associated middleware (such as MPI and parallel file systems) have been exploiting the advances in HPC technologies (multi-/many-core architecture, RDMA-enabled networking, and SSD) for many years. However, Big Data processing and management middleware have not fully taken advantage of such technologies. These disparities are taking HPC and Big Data processing into divergent trajectories. This chapter provides an overview of popular Big Data processing middleware, high-performance interconnects and storage architectures, and discusses the challenges in accelerating Big Data processing middleware by leveraging emerging technologies on modern HPC clusters. This chapter presents case studies of advanced designs based on RDMA and heterogeneous storage architecture, that were proposed to address these challenges for multiple components of Hadoop (HDFS and MapReduce) and Spark. The advanced designs presented in the case studies are publicly available as a part of the High-Performance Big Data (HiBD) project. An overview of the HiBD project is also provided in this chapter. All these works aim to bring HPC and Big Data processing into a convergent trajectory.

Cite

CITATION STYLE

APA

Lu, X., Wasi-ur-Rahman, M., Islam, N., Shankar, D., & Panda, D. K. (2016). Accelerating big data processing on modern HPC clusters. In Conquering Big Data with High Performance Computing (pp. 79–106). Springer International Publishing. https://doi.org/10.1007/978-3-319-33742-5_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free