Improving the memory efficiency of in-memory mapreduce based HPC systems

Cheng Pei; Xuanhua Shi; Hai Jin

Conference Proceedings

Improving the memory efficiency of in-memory mapreduce based HPC systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9528 170-184

DOI: 10.1007/978-3-319-27119-4_12

2Citations

1Readers

Get full text

Abstract

In-memory cluster computing systems based MapReduce, such as Spark, have made a great impact in addressing all kinds of big data problems. Given the overuse of memory speed, which stems from avoiding the latency caused by disk I/O operations, some process designs may cause resource inefficiency in traditional high performance computing (HPC) systems. Hash-based shuffle, particularly large-scale shuffle, can significantly affect job performance through excessive file operations and unreasonable use of memory. Some intermediate data unnecessarily overflow to the disk when memory usage is unevenly distributed or when memory runs out. Thus, in this study, Write Handle Reusing is proposed to fully utilize memory in shuffle file writing and reading. Load Balancing Optimizer is introduced to ensure the even distribution of data processing across all worker nodes, and Memory-Aware Task Scheduler that coordinates concurrency level and memory usage is also developed to prevent memory spilling. Experimental results on representative workloads demonstrate that the proposed approaches can decrease the overall job execution time and improve memory efficiency.

Author supplied keywords

Cite

CITATION STYLE

APA

Pei, C., Shi, X., & Jin, H. (2015). Improving the memory efficiency of in-memory mapreduce based HPC systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9528, pp. 170–184). Springer Verlag. https://doi.org/10.1007/978-3-319-27119-4_12

Improving the memory efficiency of in-memory mapreduce based HPC systems

Abstract

Author supplied keywords

Cite

Register to see more suggestions