The FTMPS-project: Design and implementation of fault-tolerance techniques for massively parallel systems

2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The FTMPS-project provides a solution to the need for fault- tolerance in large systems. A complete fault-tolerance approach is developed and being implemented. The built-in hardware error-detection features combined with software error-detection techniques provide a high coverage of transient as well as permanent failures. Combined with the diagnosis software, the necessary information for the OSS (statistics and visualisation) and the possibly reconfiguration is collected. Backward error recovery based on checkpointing and rollback, is implemented.

Cite

CITATION STYLE

APA

Vounckx, J., Deconinck, G., Lauwereins, R., Viehöver, G., Wagner, R., Madeira, H., … Willeke, H. (1994). The FTMPS-project: Design and implementation of fault-tolerance techniques for massively parallel systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 797 LNCS, pp. 401–406). Springer Verlag. https://doi.org/10.1007/3-540-57981-8_151

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free