Fast sparse matrix-vector multiplication by exploiting variable block structure

109Citations
Citations of this article
56Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We improve the performance of sparse matrix-vector multiplication (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this structure. We split the matrix, A, into a sum, A 1 + A 2 + ... + A s, where each term is stored in a new data structure we refer to as unaligned block compressed sparse row (UBCSR) format . A classical approach which stores A in a block compressed sparse row (BCSR) format can also reduce execution time, but the improvements may be limited because BCSR imposes an alignment of the matrix non-zeros that leads to extra work from filled-in zeros. Combining splitting with UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. We show speedups can be as high as 2.1× over no blocking, and as high as 1.8× over BCSR as used in prior work on a set of application matrices. Even when performance does not improve significantly, split UBCSR usually reduces matrix storage. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Vuduc, R. W., & Moon, H. J. (2005). Fast sparse matrix-vector multiplication by exploiting variable block structure. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3726 LNCS, pp. 807–816). https://doi.org/10.1007/11557654_91

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free