Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter

Jie Li; George Michelogiannakis; Brandon Cook; Dulanya Cooray; Yong Chen

Conference ProceedingsOPEN ACCESS

Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2023) 13948 LNCS 297-316

DOI: 10.1007/978-3-031-32041-5_16

5Citations

3Readers

Abstract

Resource demands of HPC applications vary significantly. However, it is common for HPC systems to primarily assign resources on a per-node basis to prevent interference from co-located workloads. This gap between the coarse-grained resource allocation and the varying resource demands can lead to HPC resources being not fully utilized. In this study, we analyze the resource usage and application behavior of NERSC’s Perlmutter, a state-of-the-art open-science HPC system with both CPU-only and GPU-accelerated nodes. Our one-month usage analysis reveals that CPUs are commonly not fully utilized, especially for GPU-enabled jobs. Also, around 64% of both CPU and GPU-enabled jobs used 50% or less of the available host memory capacity. Additionally, about 50% of GPU-enabled jobs used up to 25% of the GPU memory, and the memory capacity was not fully utilized in some ways for all jobs. While our study comes early in Perlmutter’s lifetime thus policies and application workload may change, it provides valuable insights on performance characterization, application behavior, and motivates systems with more fine-grain resource allocation.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, J., Michelogiannakis, G., Cook, B., Cooray, D., & Chen, Y. (2023). Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13948 LNCS, pp. 297–316). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-32041-5_16

Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter

Abstract

Author supplied keywords

Cite

Register to see more suggestions