A family of high-performance matrix multiplication algorithms

John A. Gunnels; Fred G. Gustavson; Greg M. Henry; Robert A. Van De Geijn

Conference Proceedings

A family of high-performance matrix multiplication algorithms

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 3732 LNCS 256-265

DOI: 10.1007/11558958_30

3Citations

8Readers

Get full text

Abstract

We describe a model of hierarchical memories and we use it to determine an optimal strategy for blocking operand matrices of matrix multiplication. The model is an extension of an earlier related model by three of the authors. As before the model predicts the form of current, state-of-the-art L1 kernels. Additionally, it shows that current L1 kernels can continue to produce their high performance on operand matrices that are as large as the L2 cache. For a hierarchical memory with L memory levels (main memory and L-1 caches), our model reduces the number of potential matrix multiply algorithms from 6 L to four. We use the shape of the matrix input operands to select one of our four algorithms. Previously four was 2 L and the model was independent of the matrix operand shapes. Because of space limitations, we do not include performance results. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Gunnels, J. A., Gustavson, F. G., Henry, G. M., & Van De Geijn, R. A. (2006). A family of high-performance matrix multiplication algorithms. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3732 LNCS, pp. 256–265). https://doi.org/10.1007/11558958_30

A family of high-performance matrix multiplication algorithms

Abstract

Cite

Register to see more suggestions