Performance modeling of parallel applications on distributed memory systems is a challenging task due to the eects of CPU speed, memory access time, and communication cost. In this paper, we propose a simple and intuitive graphical model, which extends the widely used Rooine performance model to include the communication cost in addition to the memory access time and the peak CPU performance. This new performance model inherits the simplicity of the original Rooine model and enables performance evaluation on a third dimension of communication performance. Such a model will greatly facilitate and expedite the analysis, development and optimization of parallel programs on high-end computer systems. We empirically validate the extended new Rooine model using oating-point-computation-bound, memory-bound, and communication-bound applications. Three distinct high-end computing platforms have been tested: 1) high performance computing (HPC) systems, 2) high throughput computing systems, and 3) cloud computing systems. Our experimental results with four dierent parallel applications show that the new model can approximately evaluate the performance of dierent programs on various distributed-memory systems. Furthermore, the extended new model is able to provide insight into how the problem size can aect the upper bound performance of parallel applications, which is a special property revealed by the new dimension of communication cost analysis.
CITATION STYLE
Cardwell, D., & Song, F. (2019). An extended roofline model with communication-awareness for distributed-memory HPC systems. In ACM International Conference Proceeding Series (pp. 26–35). Association for Computing Machinery. https://doi.org/10.1145/3293320.3293321
Mendeley helps you to discover research relevant for your work.