Programming model extensions for resilience in extreme scale computing

Saurabh Hukerikar; Pedro C. Diniz; Robert F. Lucas

Conference ProceedingsOPEN ACCESS

Programming model extensions for resilience in extreme scale computing

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7640 LNCS 496-498

DOI: 10.1007/978-3-642-36949-0_56

3Citations

2Readers

Abstract

The challenge of resilience is becoming increasingly important on the path to exascale capability in High Performance Computing (HPC) systems. With clock frequencies unlikely to increase as aggressively as they have in the past, future large scale HPC systems aspiring exaflop capability will need an exponential increase in the count of the ALUs and memory modules deployed in their design [Kogge 2008]. The Mean Time to Failure (MTTF) of the system however, scales inversely to the number of components in the system. Furthermore, these systems will be constructed using devices that are far less reliable than those used today, as transistor geometries shrink and the failures due to chip manufacturing variability, effects of transistor aging as well as transient soft errors will become more prevalent. Therefore the sheer scale of future exascale supercomputers, together with the shrinking VLSI geometries will conspire to make faults and failures increasingly the norm rather than the exception. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Hukerikar, S., Diniz, P. C., & Lucas, R. F. (2013). Programming model extensions for resilience in extreme scale computing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7640 LNCS, pp. 496–498). https://doi.org/10.1007/978-3-642-36949-0_56

Programming model extensions for resilience in extreme scale computing

Abstract

Cite

Register to see more suggestions