Design and implementation of dynamic process management for grid-enabled MPICH

2Citations
Citations of this article
92Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper presents the design and impementation of MPI_Rejoin() for MPICH-GF, a grid-enabled fault tolerant MPICH implementation. To provide fault tolerance to the MPI applications, it is mandatory for a failed process to recover and continue execution. However, current MPI implementations do not support dynamic process management and it is not possible to restore the information regarding communication channels. The 'rejoin' operation allows the restored process to rejoin the existing group by updating the corresponding entries of the channel table with the new physical address. We have verified that our implementation can correctly reconstruct the MPI communication structure by running NPB applications. We also report on the cost of 'rejoin' operation. © Springer-Verlag Berlin Heidelberg 2003.

Cite

CITATION STYLE

APA

Kim, S., Woo, N., Yeom, H. Y., Park, T., & Park, H. W. (2003). Design and implementation of dynamic process management for grid-enabled MPICH. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2840, 653–656. https://doi.org/10.1007/978-3-540-39924-7_87

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free