Blocked matrix multiplication algorithms such as Cannon's algorithm and SUMMA have a 2-dimensional communication structure. We introduce a generalized 'Split-Dimensional' version of Cannon's algorithm (SD-Cannon) with higher-dimensional and bidirectional communication structure. This algorithm is useful for torus interconnects that can achieve more injection bandwidth than single-link bandwidth. On a bidirectional torus network of dimension d, SD-Cannon can lower the algorithmic bandwidth cost by a factor of up to d. With rectangular collectives, SUMMA also achieves the lower bandwidth cost but has a higher latency cost. We use Charm++ virtualization to efficiently map SD-Cannon on unbalanced and odd-dimensional torus network partitions. Our performance study on Blue Gene/P demonstrates that a MPI version of SD-Cannon can exploit multiple communication links and improve performance. © 2013 Springer-Verlag.
CITATION STYLE
Solomonik, E., & Demmel, J. (2013). Matrix multiplication on multidimensional torus networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7851 LNCS, pp. 201–215). https://doi.org/10.1007/978-3-642-38718-0_21
Mendeley helps you to discover research relevant for your work.