Experiences of converging big data analytics frameworks with high performance computing systems

Peng Cheng; Yutong Lu; Yunfei Du; Zhiguang Chen

Conference ProceedingsOPEN ACCESS

Experiences of converging big data analytics frameworks with high performance computing systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10776 LNCS 90-106

DOI: 10.1007/978-3-319-69953-0_6

4Citations

5Readers

Abstract

With the rapid development of big data analytics frameworks, many existing high performance computing (HPC) facilities are evolving new capabilities to support big data analytics workloads. However, due to the different workload characteristics and optimization objectives of system architectures, migrating data-intensive applications to HPC systems that are geared for traditional compute-intensive applications presents a new challenge. In this paper, we address a critical question on how to accelerate complex application that contains both data-intensive and compute-intensive workloads on the Tianhe-2 system by deploying an in-memory file system as data access middleware; we characterize the impact of storage architecture on data-intensive MapReduce workloads when using Lustre as the underlying file system. Based on our characterization and findings of the performance behaviors, we propose shared map output shuffle strategy and file metadata cache layer to alleviate the impact of metadata bottleneck. The evaluation of these optimization techniques shows up to 17% performance benefit for data-intensive workloads.

Author supplied keywords

Cite

CITATION STYLE

APA

Cheng, P., Lu, Y., Du, Y., & Chen, Z. (2018). Experiences of converging big data analytics frameworks with high performance computing systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10776 LNCS, pp. 90–106). Springer Verlag. https://doi.org/10.1007/978-3-319-69953-0_6

Experiences of converging big data analytics frameworks with high performance computing systems

Abstract

Author supplied keywords

Cite

Register to see more suggestions