The sparse matrix-vector (SpMV) multiplication is one of the key kernels in scientific computing. We present the foundations of its implementation on CUDA- and OpenCL-enabled devices. After introducing the subject, we briefly present three most popular formats: COO, CRS and ELL. They serve as exemplary data structures on which we discuss hardware-related issues associated with efficient SpMV kernel design, such as matrix size, ordering of data, memory boundedness, storage overhead, thread divergence, and coalescence of memory transfers. Next, we present three widely available libraries with stable and validated SpMV kernels: cuSPARSE, CUSP and Paralution. We present and discuss complete codes of several SpMV kernels for both basic SpMV formats and some of its derivatives, including CMRS, and briefly discuss the principles beyond other popular format extensions.
CITATION STYLE
Koza, Z., Matyka, M., Mirosław, Ł., & Poła, J. (2014). Sparse matrix-vector product. In Numerical Computations with GPUs (pp. 103–121). Springer International Publishing. https://doi.org/10.1007/978-3-319-06548-9_6
Mendeley helps you to discover research relevant for your work.