Application-specific fault tolerance via data access characterization

4Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent trends in semiconductor technology and supercomputer design predict an increasing probability of faults during an application's execution. Designing an application that is resilient to system failures requires careful evaluation of the impact of various approaches on preserving key application state. In this paper, we present our experiences in an ongoing effort to make a large computational chemistry application fault tolerant. We construct the data access signatures of key application modules to evaluate alternative fault tolerance approaches. We present the instrumentation methodology, characterization of the application modules, and evaluation of fault tolerance techniques using the information collected. The application signatures developed capture application characteristics not traditionally revealed by performance tools. We believe these can be used in the design and evaluation of runtimes beyond fault tolerance. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Ali, N., Krishnamoorthy, S., Govind, N., Kowalski, K., & Sadayappan, P. (2011). Application-specific fault tolerance via data access characterization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6853 LNCS, pp. 340–352). https://doi.org/10.1007/978-3-642-23397-5_34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free