On the parallel I/O optimality of linear algebra kernels: Near-optimal matrix factorizations

Grzegorz Kwasniewski; Marko Kabic; Tal Ben-Nun; Alexandros Nikolaos Ziogas; Jens Eirik Saethre; Andre Gaillard; Timo Schneider; Maciej Besta; Anton Kozhevnikov; Joost Vande Vondele; Torsten Hoefler

Conference ProceedingsOPEN ACCESS

On the parallel I/O optimality of linear algebra kernels: Near-optimal matrix factorizations

International Conference for High Performance Computing, Networking, Storage and Analysis, SC (2021)

DOI: 10.1145/3458817.3476167

9Citations

14Readers

Get full text

Abstract

Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-The-Art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving parallel I/O lower bounds for linear algebra kernels, and then utilize its insights to derive Cholesky and LU schedules, both communicating N3/(P √ M) elements per processor, where M is the local memory size. The empirical results match our theoretical analysis: our implementations communicate significantly less than Intel MKL, SLATE, and the asymptotically communication-optimal CANDMC and CAPITAL libraries. Our code outperforms these state-of-The-Art libraries in almost all tested scenarios, with matrix sizes ranging from 2,048 to 524,288 on up to 512 CPU nodes of the Piz Daint supercomputer, decreasing the time-To-solution by up to three times. Our code is ScaLAPACK-compatible and available as an open-source library.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Kwasniewski, G., Kabic, M., Ben-Nun, T., Ziogas, A. N., Saethre, J. E., Gaillard, A., … Hoefler, T. (2021). On the parallel I/O optimality of linear algebra kernels: Near-optimal matrix factorizations. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society. https://doi.org/10.1145/3458817.3476167

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 5

71%

Lecturer / Post doc 2

29%

Readers' Discipline

Computer Science 5

71%

Mathematics 1

14%

Engineering 1

14%

On the parallel I/O optimality of linear algebra kernels: Near-optimal matrix factorizations

Abstract

Author supplied keywords

References Powered by Scopus

Design of Ion-Implanted MOSFET's With Very Small Physical Dimensions

CP2K: An electronic structure and molecular dynamics software package -Quickstep: Efficient and accurate electronic structure calculations

The Input/Output Complexity of Sorting and Related Problems

Cited by Powered by Scopus

HammingMesh: A Network Topology for Large-Scale Deep Learning

Deinsum: Practically I/O Optimal Multi-Linear Algebra

Using Additive Modifications in LU Factorization Instead of Pivoting

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline