A fully adaptive fault-tolerant routing methodology based on intermediate nodes

N. A. Nordbotten; M. E. Gómez; J. Flich; P. López; A. Robles; T. Skeie; O. Lysne; J. Duato

Journal ArticleOPEN ACCESS

A fully adaptive fault-tolerant routing methodology based on intermediate nodes

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3222 341-356

DOI: 10.1007/978-3-540-30141-7_49

12Citations

4Readers

Abstract

Massively parallel computing systems are being built with thousands of nodes. Because of the high number of components, it is critical to keep these systems running even in the presence of failures. Interconnection networks play a key-role in these systems, and this paper proposes a fault-tolerant routing methodology for use in such networks. The methodology supports any minimal routing function (including fully adaptive routing), does not degrade performance in the absence of faults, does not disable any healthy node, and is easy to implement both in meshes and tori. In order to avoid network failures, the methodology uses a simple mechanism: for some source-destination pairs, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network). The methodology is shown to tolerate a large number of faults (e.g., five/nine faults when using two/three intermediate nodes in a 3D torus). Furthermore, the methodology offers a gracious performance degradation: in an 8 × 8 × 8 torus network with 14 faults the throughput is only decreased by 6.49%. © IFIP International Federation for Information Processing 2004.

Author supplied keywords

Cite

CITATION STYLE

APA

Nordbotten, N. A., Gómez, M. E., Flich, J., López, P., Robles, A., Skeie, T., … Duato, J. (2004). A fully adaptive fault-tolerant routing methodology based on intermediate nodes. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3222, 341–356. https://doi.org/10.1007/978-3-540-30141-7_49

A fully adaptive fault-tolerant routing methodology based on intermediate nodes

Abstract

Author supplied keywords

Cite

Register to see more suggestions