Matrix multiplication on multidimensional torus networks

2Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Blocked matrix multiplication algorithms such as Cannon's algorithm and SUMMA have a 2-dimensional communication structure. We introduce a generalized 'Split-Dimensional' version of Cannon's algorithm (SD-Cannon) with higher-dimensional and bidirectional communication structure. This algorithm is useful for torus interconnects that can achieve more injection bandwidth than single-link bandwidth. On a bidirectional torus network of dimension d, SD-Cannon can lower the algorithmic bandwidth cost by a factor of up to d. With rectangular collectives, SUMMA also achieves the lower bandwidth cost but has a higher latency cost. We use Charm++ virtualization to efficiently map SD-Cannon on unbalanced and odd-dimensional torus network partitions. Our performance study on Blue Gene/P demonstrates that a MPI version of SD-Cannon can exploit multiple communication links and improve performance. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Solomonik, E., & Demmel, J. (2013). Matrix multiplication on multidimensional torus networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7851 LNCS, pp. 201–215). https://doi.org/10.1007/978-3-642-38718-0_21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free