An approach for hierarchical system level diagnosis of massively parallel computers combined with a simulation-based method for dependability analysis

J. Aitmann; F. Balbach; A. Hein

Conference Proceedings

An approach for hierarchical system level diagnosis of massively parallel computers combined with a simulation-based method for dependability analysis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (1994) 852 LNCS 372-385

DOI: 10.1007/3-540-58426-9_142

6Citations

3Readers

Get full text

Abstract

The primary focus in the analysis of massively parallel supercomputers has traditionally been on their performance. However, their complex network topologies, large number of processors, and sophisticated system software can make them very unreliable. If every failure of one of the many components of a massively parallel computer could shut down the machine, the machine would be useless. Therefore fault tolerance is required. The basis of effective mehanisms for fault tolerance is an efficient diagnosis. This paper deals with concurrent and hierarchical system level diagnosis for a particular massively parallel architecture and with a sinaulation-based method to validate the proposed diagnosis algorithm. The diagnosis algorithm is presented and we describe a simulation-based method to test and verify the algorithms for fault tolerance already during the design phase of the target machine.

Author supplied keywords

Cite

CITATION STYLE

APA

Aitmann, J., Balbach, F., & Hein, A. (1994). An approach for hierarchical system level diagnosis of massively parallel computers combined with a simulation-based method for dependability analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 852 LNCS, pp. 372–385). Springer Verlag. https://doi.org/10.1007/3-540-58426-9_142

An approach for hierarchical system level diagnosis of massively parallel computers combined with a simulation-based method for dependability analysis

Abstract

Author supplied keywords

Cite

Register to see more suggestions