We describe a model of hierarchical memories and we use it to determine an optimal strategy for blocking operand matrices of matrix multiplication. The model is an extension of an earlier related model by three of the authors. As before the model predicts the form of current, state-of-the-art L1 kernels. Additionally, it shows that current L1 kernels can continue to produce their high performance on operand matrices that are as large as the L2 cache. For a hierarchical memory with L memory levels (main memory and L-1 caches), our model reduces the number of potential matrix multiply algorithms from 6 L to four. We use the shape of the matrix input operands to select one of our four algorithms. Previously four was 2 L and the model was independent of the matrix operand shapes. Because of space limitations, we do not include performance results. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Gunnels, J. A., Gustavson, F. G., Henry, G. M., & Van De Geijn, R. A. (2006). A family of high-performance matrix multiplication algorithms. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3732 LNCS, pp. 256–265). https://doi.org/10.1007/11558958_30
Mendeley helps you to discover research relevant for your work.