Providing non-stop service for message-passing based parallel applications with RADIC

6Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The current supercomputers are almost achieving the petaflop level. These machines present a high number of interruptions in a relatively short time interval. Fault tolerance and preventive maintenance are key issues in order to enlarge the MTTI (Mean Time To Interrupt). In this paper we present how RADIC, a architecture for fault tolerance, provides different protection levels able to avoid system interruptions and allows the performance of preventive maintenance tasks. Our experiments show the effectiveness of our solution in order to keep a high availability with a large MTTI. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Santos, G., Duarte, A., Rexachs, D., & Luque, E. (2008). Providing non-stop service for message-passing based parallel applications with RADIC. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5168 LNCS, pp. 58–67). https://doi.org/10.1007/978-3-540-85451-7_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free