Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems

6Citations
Citations of this article
41Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper describes and evaluates a scalable and efficient resilience scheme based on the concept of containment domains. Containment domains are a programming construct that enable applications to express resilience needs and to interact with the system to tune and specialize error detection, state preservation and restoration, and recovery schemes. Containment domains have weak transactional semantics and are nested to take advantage of the machine and application hierarchies and to enable hierarchical state preservation, restoration and recovery. We evaluate the scalability and efficiency of containment domains using generalized trace-driven simulation and analytical analysis and show that containment domains are superior to both checkpoint restart and redundant execution approaches. © 2013 - IOS Press and the authors. All rights reserved.

Cite

CITATION STYLE

APA

Chung, J., Lee, I., Sullivan, M., Ryoo, J. H., Kim, D. W., Yoon, D. H., … Erez, M. (2013). Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems. In Scientific Programming (Vol. 21, pp. 197–212). Hindawi Limited. https://doi.org/10.1155/2013/473915

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free