Parallel efficient sparse matrix-matrix multiplication on multicore platforms

47Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Sparse matrix-matrix multiplication (SpGEMM) is a key kernel in many applications in High Performance Computing such as algebraic multigrid solvers and graph analytics. Optimizing SpGEMM on modern processors is challenging due to random data accesses, poor data locality and load imbalance during computation. In this work, we investigate different partitioning techniques, cache optimizations (using dense arrays instead of hash tables), and dynamic load balancing on SpGEMM using a diverse set of real-world and synthetic datasets. We demonstrate that our implementation outperforms the state-of-the-art using Intel® Xeon® processors. We are up to 3.8X faster than Intel® Math Kernel Library (MKL) and up to 257X faster than CombBLAS.We also outperform the best published GPU implementation of SpGEMM on nVidia GTX Titan and on AMD Radeon HD 7970 by up to 7.3X and 4.5X, respectively on their published datasets. We demonstrate good multi-core scalability (geomean speedup of 18.2X using 28 threads) as compared to MKL which gets 7.5X scaling on 28 threads.

Cite

CITATION STYLE

APA

Patwary, M. M. A., Satish, N. R., Sundaram, N., Park, J., Anderson, M. J., Vadlamudi, S. G., … Dubey, P. (2015). Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9137 LNCS, pp. 48–57). Springer Verlag. https://doi.org/10.1007/978-3-319-20119-1_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free