ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-Grained Resource Management

Han Zhao; Weihao Cui; Quan Chen; Minyi Guo

Journal ArticleOPEN ACCESS

ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-Grained Resource Management

IEEE Transactions on Computers (2023) 72(5) 1473-1487

DOI: 10.1109/TC.2022.3214088

2Citations

9Readers

Abstract

Emerging GPUs have multiple Streaming Multiprocessors (SM), while each SM is comprised of CUDA Cores and Tensor Cores. While CUDA Cores do the general computation, Tensor Cores are designed to speed up matrix multiplication for deep learning applications. However, a GPU kernel often either uses CUDA Cores or Tensor Cores, leaving the other processing units idle. Although many prior research works have been proposed to co-locate kernels to improve GPU utilization, they cannot leverage the Intra-SM CUDA Core-Tensor Core Parallelism. Specifically, ISPA designs persistent and elastic block to solve the thread slot and shared memory contention between co-located kernels. ISPA also adopts the register allocation method to manage the register contention. These resource management methods are applicable for both white-box kernels and cudnn kernels. Experimental results on an Nvidia 2080Ti GPU show that ISPA improves the system-wide throughput by 15.3% for white-box workloads, and 7.1% for cudnn-based workloads compared with prior co-location work.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhao, H., Cui, W., Chen, Q., & Guo, M. (2023). ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-Grained Resource Management. IEEE Transactions on Computers, 72(5), 1473–1487. https://doi.org/10.1109/TC.2022.3214088

ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-Grained Resource Management

Abstract

Author supplied keywords

Cite

Register to see more suggestions