Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA implementation to perform SpMV on the GPU. While previous work shows experiments on small to medium-sized sparse matrices, we perform evaluations on large sparse matrices. We compared SCOO performance to existing formats of the NVIDIA Cusp library. Our resutls on a Fermi GPU show that SCOO outperforms the COO and CSR format for all tested matrices and the HYB format for all tested unstructured matrices. Furthermore, comparison to a Sandy-Bridge CPU shows that SCOO on a Fermi GPU outperforms the multi-threaded CSR implementation of the Intel MKL Library on an i7-2700K by a factor between 5.5 and 18. © 2012 Published by Elsevier Ltd.
Dang, H. V., & Schmidt, B. (2012). The sliced COO format for sparse matrix-vector multiplication on CUDA-enabled GPUs. In Procedia Computer Science (Vol. 9, pp. 57–66). Elsevier B.V. https://doi.org/10.1016/j.procs.2012.04.007