Spatula: A Hardware Accelerator for Sparse Matrix Factorization

2Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

Solving sparse systems of linear equations is a crucial component in many science and engineering problems, like simulating physical systems. Sparse matrix factorization dominates a large class of these solvers. Efficient factorization algorithms have two key properties that make them challenging for existing architectures: they consist of small tasks that are structured and compute-intensive, and sparsity induces long chains of data dependences among these tasks. Data dependences make GPUs struggle, while CPUs and prior sparse linear algebra accelerators also suffer from low compute throughput. We present Spatula, an architecture for accelerating sparse matrix factorization algorithms. Spatula hardware combines systolic processing elements that execute structured tasks at high throughput with a flexible scheduler that handles challenging data dependences. Spatula enables a novel scheduling algorithm that avoids stalls and load imbalance while reducing data movement, achieving high compute utilization. As a result, Spatula outperforms a GPU running the state-of-the-art sparse Cholesky and LU factorization implementations by gmean 47 × across a wide range of matrices, and by up to thousands of times on some challenging matrices.

Cite

CITATION STYLE

APA

Feldmann, A., & Sanchez, D. (2023). Spatula: A Hardware Accelerator for Sparse Matrix Factorization. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023 (pp. 91–104). Association for Computing Machinery, Inc. https://doi.org/10.1145/3613424.3623783

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free