Minimizing communication in sparse matrix solvers

  • Mohiyuddin M
  • Hoemmen M
  • Demmel J
 et al. 
  • 47


    Mendeley users who have this article in their library.
  • 49


    Citations of this article.


Data communication within the memory system of a single processor node and between multiple nodes in a system is the bottleneck in many iterative sparse matrix solvers like CG and GMRES. Here k iterations of a conventional implementation perform k sparse-matrix-vector-multiplications and Ω(k) vector operations like dot products, resulting in communication that grows by a factor of Ω(k) in both the memory and network. By reorganizing the sparse-matrix kernel to compute a set of matrix-vector products at once and reorganizing the rest of the algorithm accordingly, we can perform k iterations by sending O(log P) messages instead of O(k · log P) messages on a parallel machine, and reading the matrix A from DRAM to cache just once, instead of k times on a sequential machine. This reduces communication to the minimum possible. We combine these techniques to form a new variant of GMRES. Our shared-memory implementation on an 8-core Intel Clovertown gets speedups of up to 4.3x over standard GMRES, without sacrificing convergence rate or numerical stability.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Marghoob Mohiyuddin

  • Mark Hoemmen

  • James Demmel

  • Katherine Yelick

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free