Efficient tensor core-based gpu kernels for structured sparsity under reduced precision

Zhaodong Chen; Zheng Qu; Liu Liu; Yufei Ding; Yuan Xie

Conference ProceedingsOPEN ACCESS

Efficient tensor core-based gpu kernels for structured sparsity under reduced precision

International Conference for High Performance Computing, Networking, Storage and Analysis, SC (2021)

DOI: 10.1145/3458817.3476182

46Citations

23Readers

Get full text

Abstract

The success of DNN comes at the expense of excessive memory/-computation cost, which can be addressed by exploiting reduced precision and sparsity jointly. Existing sparse GPU kernels, however, fail to achieve practical speedup over cuBLASHgemm under half-precision. Those for fine-grained sparsity suffer from low data reuse, and others for coarse-grained sparsity are limited by the wrestling between kernel performance and model quality under different grain sizes. We propose column-vector-sparse-encoding that has a smaller grain size under the same reuse rate compared with block sparsity. Column-vector-sparse-encoding can be applied to both SpMM & SDDMM, two major sparse DNN operations. We also introduce the Tensor-Core-based 1D Octet Tiling that has efficient memory access and computation patterns under small grain size. Based on these, we design SpMM and SDDMM kernels and achieve 1.71-7.19x speedup over cuSPARSE. Practical speedup is achieved over cuBLASHgemm under70% and 90% sparsity with 4x1 grain size and half-precision.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, Z., Qu, Z., Liu, L., Ding, Y., & Xie, Y. (2021). Efficient tensor core-based gpu kernels for structured sparsity under reduced precision. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society. https://doi.org/10.1145/3458817.3476182

Efficient tensor core-based gpu kernels for structured sparsity under reduced precision

Abstract

Author supplied keywords

Cite

Register to see more suggestions