Addressing failures in exascale computing

  • Snir M
  • Wisniewski R
  • Abraham J
 et al. 
  • 61

    Readers

    Mendeley users who have this article in their library.
  • 85

    Citations

    Citations of this article.

Abstract

We present here a report produced by a workshop on ‘Addressing failures in exascale computing’ held in Park City, Utah, 4–11 August 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system, discuss existing knowledge on resilience across the various hardware and software layers of an exascale system, and build on those results, examining potential solutions from both a hardware and software perspective and focusing on a combined approach.The workshop brought together participants with expertise in applications, system software, and hardware; they came from industry, government, and academia, and their interests ranged from theory to implementation. The combination allowed broad and comprehensive discussions and led to this document, which summarizes and builds on those discussions. [ABSTRACT FROM AUTHOR]

Author-supplied keywords

  • Resilience
  • exascale
  • extreme-scale computing
  • fault-tolerance
  • high-performance computing

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • Marc Snir

  • Robert W. Wisniewski

  • Jacob A. Abraham

  • Sarita V. Adve

  • Saurabh Bagchi

  • Pavan Balaji

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free