SparseRT: Accelerating unstructured sparsity on GPUs for deep learning inference

Ziheng Wang

Conference ProceedingsOPEN ACCESS

SparseRT: Accelerating unstructured sparsity on GPUs for deep learning inference

Wang Z

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT (2020) 31-42

DOI: 10.1145/3410463.3414654

54Citations

41Readers

Abstract

In recent years, there has been a flurry of research in deep neural network pruning and compression. Early approaches pruneweights individually. However, it is difficult to take advantage ofthe resulting unstructured sparsity patterns on modern hardwarelike GPUs. As a result, pruning strategies which impose sparsitystructures in the weights have become more popular. However,these structured pruning approaches typically lead to higher lossesin accuracy than unstructured pruning. In this paper, we presentSparseRT, a code generator that leverage unstructured sparsity toaccelerate sparse linear algebra operations in deep learning inference on GPUs. For 1x1 convolutions and fully connected layers, wedemonstrate geometric mean of speedups of 3.4x over the equivalent dense computation at 90% sparsity and 5.4x at 95% sparsitywhen evaluated on hundreds of test cases in deep learning. Forsparse 3x3 convolutions, we show speedups of over 5x on use casesin ResNet-50.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Z. (2020). SparseRT: Accelerating unstructured sparsity on GPUs for deep learning inference. In Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT (pp. 31–42). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3410463.3414654

SparseRT: Accelerating unstructured sparsity on GPUs for deep learning inference

Abstract

Author supplied keywords

Cite

Register to see more suggestions