Register Tiling for Unstructured Sparsity in Neural Network Inference

5Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

Unstructured sparse neural networks are an important class of machine learning (ML) models, as they compact model size and reduce floating point operations. The execution time of these models is frequently dominated by the sparse matrix multiplication (SpMM) kernel, C=A× B, where A is a sparse matrix, and B and C are dense matrices. The unstructured sparsity pattern of matrices in pruned machine learning models along with their sparsity ratio has rendered useless the large class of libraries and systems that optimize sparse matrix multiplications. Reusing registers is particularly difficult because accesses to memory locations should be known statically. This paper proposes Sparse Register Tiling, a new technique composed of an unroll-and-sparse-jam transformation followed by data compression that is specifically tailored to sparsity patterns in ML matrices. Unroll-and-sparse-jam uses sparsity information to jam the code while improving register reuse. Sparse register tiling is evaluated across 2396 weight matrices from transformer and convolutional models with a sparsity range of 60-95% and provides an average speedup of 1.72× and 2.65× over MKL SpMM and dense matrix multiplication, respectively, on a multicore CPU processor. It also provides an end-to-end speedup of 2.12× for MobileNetV1 with 70% sparsity on an ARM processor commonly used in edge devices.

Cite

CITATION STYLE

APA

Wilkinson, L., Cheshmi, K., & Dehnavi, M. M. (2023). Register Tiling for Unstructured Sparsity in Neural Network Inference. Proceedings of the ACM on Programming Languages, 7. https://doi.org/10.1145/3591302

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free