Computationally efficient parallel matrix-matrix multiplication on the torus

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we represent the computation space of the (nxn)-matrix multiplication problem C=C+A·B as a 3D torus. All possible time-minimal scheduling vectors needed to activate the computations inside the corresponding 3D index points at each step of computing are determined. Using the projection method to allocate the scheduled computations to the processing elements, the resulting array processor that minimizes the computing time is a 2D torus with nxn processing elements. For each optimal time scheduling function, three optimal array allocations are obtained from projection. All the resulting allocations of all the optimal scheduling vectors can be classified into three groups. In one group, matrix C remains and both matrices A and B are shifted between neighbor processors. The well-known Cannon's algorithm belongs to this group. In another group, matrix A remains and both matrices B and C are shifted. In the third group, matrix B remains while both matrices A and C are shifted. The obtained array processor allocations need n compute-shift steps to multiply nxn dense matrices. © Springer-Verlag Berlin Heidelberg 2008.

Cite

CITATION STYLE

APA

Zekri, A. S., & Sedukhin, S. G. (2008). Computationally efficient parallel matrix-matrix multiplication on the torus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4759 LNCS, pp. 219–226). Springer Verlag. https://doi.org/10.1007/978-3-540-77704-5_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free