Cross-accelerator performance profiling

Esthela Gallardo; Patricia J. Teller; Arturo Argueta; Jaime Jaloma

Conference ProceedingsOPEN ACCESS

Cross-accelerator performance profiling

ACM International Conference Proceeding Series (2016) 17-21-July-2016

DOI: 10.1145/2949550.2949567

2Citations

7Readers

Abstract

The computing requirements of scientific applications have influenced processor design, and have motivated the intro-duction and use of many-core processors, i.e., accelerators, for high performance computing (HPC). Consequently, it is now common for the compute nodes of HPC clusters to be comprised of multiple computing devices, including ac-celerators. Although execution time can be used to com-pare the performance of different computing devices, there exists no standard way to analyze application performance across devices with very different architectural designs and, thus, understand why one outperforms another. Without this knowledge, a developer is handicapped when attempting to effectively tune application performance, as is a hardware designer when trying to understand how best to improve the design of computing devices. In this paper, we use the LULESH 1.0 proxy application to compare and analyze the performance of three different accelerators: the Intel® Xeon Phi™ and the NVIDIA Fermi and Kepler GPUs. Our study shows that LULESH 1.0 exhibits similar executiontime behavior across the three accelerators, but runs up to 7X faster on the Kepler. Despite the significant architectural differences between the Xeon Phi™ and the GPUs, and the differences in the metrics used to characterize their performance, we were able to quantify why the Kepler outperforms both the Fermi and the Xeon Phi™. To do this, we compared their achieved instructions per cycle and vectorization usage, as well as their memory behavior and power and energy consumption.

Author supplied keywords

Cite

CITATION STYLE

APA

Gallardo, E., Teller, P. J., Argueta, A., & Jaloma, J. (2016). Cross-accelerator performance profiling. In ACM International Conference Proceeding Series (Vol. 17-21-July-2016). Association for Computing Machinery. https://doi.org/10.1145/2949550.2949567

Cross-accelerator performance profiling

Abstract

Author supplied keywords

Cite

Register to see more suggestions