An Efficient Transposition Algorithm for Distributed Memory Computers

Christina Christara; Xiaoliang Ding; Ken Jackson

Book Chapter

An Efficient Transposition Algorithm for Distributed Memory Computers

Christara C
Ding X
Jackson K

Kluwer Academic Publishers, (2005), 349-370

DOI: 10.1007/0-306-47015-2_38

N/ACitations

3Readers

Get full text

Abstract

Data transposition is required in many numerical applications. When implemented on a distributed-memory computer, data transposition requires all-to-all communication, a time consuming operation. The Direct Exchange algorithm, commonly used for this task, is inefficient if the number of processors is large. We investigate a series of more sophisticated techniques: the Ring Exchange, Mesh Exchange and Cube Exchange algorithms. These data transposition schemes were incorporated into a parallel solver for the shallow-water equations. We compare the performance of these schemes with that of the Direct Exchange Algorithm and the MPI all-to-all communication routine, MPI_AllToAll. The numerical experiments were performed on a Cray T3E computer with 512 processors and on an ethernet-connected cluster of 36 Sun workstations. Both the analysis and the numerical results indicate that the more sophisticated Mesh and Cube Exchange algorithms perform better than either the simpler well-known Direct and Ring Exchange schemes or the MPI_AllToAll routine. We also generalize the Mesh and Cube Exchange algorithms to a d-dimensional mesh algorithm, which can be viewed as a generalization of the standard hypercube data transposition algorithm.

Cite

CITATION STYLE

APA

Christara, C., Ding, X., & Jackson, K. (2005). An Efficient Transposition Algorithm for Distributed Memory Computers. In High Performance Computing Systems and Applications (pp. 349–370). Kluwer Academic Publishers. https://doi.org/10.1007/0-306-47015-2_38

An Efficient Transposition Algorithm for Distributed Memory Computers

Abstract

Cite

Register to see more suggestions