Accelerating MLWorkloads using GPU Tensor Cores: The Good, the Bad, and the Ugly

14Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Machine Learning (ML)workloads generally contain a significant amount of matrix computations; hence, hardware accelerators for ML have been incorporating support for matrix accelerators. With the popularity of GPUs as hardware accelerators for ML, specialized matrix accelerators are embedded into GPUs (e.g., Tensor Cores on NVIDIA GPUs) to significantly improve the performance and energy efficiency of ML workloads. NVIDIA Tensor Cores and other matrix accelerators have been designed to support General Matrix-Matrix Multiplication (GEMM) for many data types. While previous research has demonstrated impressive performance gains with Tensor Cores, they primarily focused on Convolutional Neural Networks (CNNs). This paper explores Tensor Cores' performance on various workloads, including Graph Convolutional Networks (GCNs), onNVIDIA H100 and A100 GPUs. In our experiments with NVIDIA GPUs, CNNs can achieve 1.91× (TF32) and 2.42× (FP16) end-to-end performance improvements with the use of Tensor Cores, whereas GCNs struggle to surpass a 1.03× (FP16) boost. Some implementations even experience slowdowns despite software transformation. Additionally, we explore the potential of Tensor Cores in non-GEMMlike kernels, providing insights into how software techniques can map diverse computation patterns onto Tensor Cores. Our investigation encompasses several kernels and end-to-end applications, aiming to comprehend the nuanced performance impact of Tensor Cores. Furthermore, we are among the first to present third-party evaluations of H100 GPU performance over the prior A100 GPU.

Cite

CITATION STYLE

APA

Hanindhito, B., & John, L. K. (2024). Accelerating MLWorkloads using GPU Tensor Cores: The Good, the Bad, and the Ugly. In ICPE 2024 - Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering (pp. 178–189). Association for Computing Machinery, Inc. https://doi.org/10.1145/3629526.3653835

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free