Load-balancing Sparse Matrix Vector Product Kernels on GPUs

Hartwig Anzt; Terry Cojean; Chen Yen-Chen; Jack Dongarra; Goran Flegar; Pratik Nayak; Stanimire Tomov; Yuhsiang M. Tsai; Weichung Wang

Journal ArticleOPEN ACCESS

Load-balancing Sparse Matrix Vector Product Kernels on GPUs

ACM Transactions on Parallel Computing (2020) 7(1)

DOI: 10.1145/3380930

29Citations

34Readers

Abstract

Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that strike a balance between thread divergence, which is inherent for Irregular Matrices, and padding, which alleviates the performance-detrimental thread divergence but introduces artificial overheads. To this end, in this article, we address the challenge of designing high performance sparse matrix-vector product (SpMV) kernels designed for Nvidia Graphics Processing Units (GPUs). We present a compressed sparse row (CSR) format suitable for unbalanced matrices. We also provide a load-balancing kernel for the coordinate (COO) matrix format and extend it to a hybrid algorithm that stores part of the matrix in SIMD-friendly Ellpack format (ELL) format. The ratio between the ELL-and the COO-part is determined using a theoretical analysis of the nonzeros-per-row distribution. For the over 2,800 test matrices available in the Suite Sparse matrix collection, we compare the performance against SpMV kernels provided by NVIDIA's cuSPARSE library and a heavily-tuned sliced ELL (SELL-P) kernel that prevents unnecessary padding by considering the irregular matrices as a combination of matrix blocks stored in ELL format.

Author supplied keywords

Cite

CITATION STYLE

APA

Anzt, H., Cojean, T., Yen-Chen, C., Dongarra, J., Flegar, G., Nayak, P., … Wang, W. (2020). Load-balancing Sparse Matrix Vector Product Kernels on GPUs. ACM Transactions on Parallel Computing, 7(1). https://doi.org/10.1145/3380930

Load-balancing Sparse Matrix Vector Product Kernels on GPUs

Abstract

Author supplied keywords

Cite

Register to see more suggestions