An Efficient Transposition Algorithm for Distributed Memory Computers

  • Christara C
  • Ding X
  • Jackson K
N/ACitations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data transposition is required in many numerical applications. When implemented on a distributed-memory computer, data transposition requires all-to-all communication, a time consuming operation. The Direct Exchange algorithm, commonly used for this task, is inefficient if the number of processors is large. We investigate a series of more sophisticated techniques: the Ring Exchange, Mesh Exchange and Cube Exchange algorithms. These data transposition schemes were incorporated into a parallel solver for the shallow-water equations. We compare the performance of these schemes with that of the Direct Exchange Algorithm and the MPI all-to-all communication routine, MPI_AllToAll. The numerical experiments were performed on a Cray T3E computer with 512 processors and on an ethernet-connected cluster of 36 Sun workstations. Both the analysis and the numerical results indicate that the more sophisticated Mesh and Cube Exchange algorithms perform better than either the simpler well-known Direct and Ring Exchange schemes or the MPI_AllToAll routine. We also generalize the Mesh and Cube Exchange algorithms to a d-dimensional mesh algorithm, which can be viewed as a generalization of the standard hypercube data transposition algorithm.

Cite

CITATION STYLE

APA

Christara, C., Ding, X., & Jackson, K. (2005). An Efficient Transposition Algorithm for Distributed Memory Computers. In High Performance Computing Systems and Applications (pp. 349–370). Kluwer Academic Publishers. https://doi.org/10.1007/0-306-47015-2_38

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free