TH-MPI: OS Kernel integrated fault tolerant MPI

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Consisting of large numbers of computing nodes, parallel cluster systems have high risks of individual node failure. To overcome the high overhead drawbacks of current fault tolerant MPI systems, this paper presents TH-MPI for parallel cluster systems. Being integrated into Linux kernel, THMPI is implemented in a more effective, transparent and extensive way. With supports of dynamic kernel module and diskless checkpointing technologies, our experiment shows that checkpointing in TH-MPI is effectively optimized.

Cite

CITATION STYLE

APA

Chen, Y., Fang, Q., Du, Z., & Li, S. (2001). TH-MPI: OS Kernel integrated fault tolerant MPI. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2131, pp. 75–82). Springer Verlag. https://doi.org/10.1007/3-540-45417-9_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free