Simulating application resilience at exascale

3Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The reliability mechanisms for future exascale systems will be a key aspect of their scalability and performance. With the expected jump in hardware component counts, faults will become increasingly common compared to today's systems. Under these circumstances, the costs of current and emergent resilience methods need to be reevaluated. This includes the cost of recovery, which is often ignored in current work, and the impact of hardware features such as heterogeneous computing elements and non-volatile memory devices. We describe a simulation and modeling framework that enables the measurement of various resilience algorithms with varying application characteristics. For this framework we outline the simulator's requirements, its application communication pattern generators, and a few of the key hardware component models. © 2012 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Riesen, R., Ferreira, K. B., Varela, M. R., Taufer, M., & Rodrigues, A. (2012). Simulating application resilience at exascale. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7156 LNCS, pp. 221–230). Springer Verlag. https://doi.org/10.1007/978-3-642-29740-3_26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free