Dynamic failure management for parallel applications on grids

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The computational grid, as it is today, is vulnerable to node failures and the probability of a node failure rapidly grows as the size of the grid increases. There have been several attempts to provide fault tolerance using checkpointing and message logging in conjunction with the MPI library. However, the Grid itself should be active in dealing with the failures. We propose a dynamic reconfigurable architecture where the applications can regroup in the face of a failure. The proposed architecture removes the single point of failure from the computational grids and provides flexibility in terms of grid configuration. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Jung, H., Shin, D., Kim, H., Han, H., Lee, I., & Yeom, H. Y. (2005). Dynamic failure management for parallel applications on grids. In Lecture Notes in Computer Science (Vol. 3470, pp. 1175–1182). Springer Verlag. https://doi.org/10.1007/11508380_120

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free