Balanced sparsity for efficient DNN inference on GPU

Zhuliang Yao; Shijie Cao; Wencong Xiao; Chen Zhang; Lanshun Nie

Conference ProceedingsOPEN ACCESS

Balanced sparsity for efficient DNN inference on GPU

33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (2019) 5676-5683

DOI: 10.1609/aaai.v33i01.33015676

63Citations

109Readers

Abstract

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, Balanced Sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that Balanced Sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as fine-grained sparsity.

Cite

CITATION STYLE

APA

Yao, Z., Cao, S., Xiao, W., Zhang, C., & Nie, L. (2019). Balanced sparsity for efficient DNN inference on GPU. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (pp. 5676–5683). AAAI Press. https://doi.org/10.1609/aaai.v33i01.33015676

Balanced sparsity for efficient DNN inference on GPU

Abstract

Cite

Register to see more suggestions