A family of high-performance matrix multiplication algorithms

3Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We describe a model of hierarchical memories and we use it to determine an optimal strategy for blocking operand matrices of matrix multiplication. The model is an extension of an earlier related model by three of the authors. As before the model predicts the form of current, state-of-the-art L1 kernels. Additionally, it shows that current L1 kernels can continue to produce their high performance on operand matrices that are as large as the L2 cache. For a hierarchical memory with L memory levels (main memory and L-1 caches), our model reduces the number of potential matrix multiply algorithms from 6 L to four. We use the shape of the matrix input operands to select one of our four algorithms. Previously four was 2 L and the model was independent of the matrix operand shapes. Because of space limitations, we do not include performance results. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Gunnels, J. A., Gustavson, F. G., Henry, G. M., & Van De Geijn, R. A. (2006). A family of high-performance matrix multiplication algorithms. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3732 LNCS, pp. 256–265). https://doi.org/10.1007/11558958_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free