Low-overhead fault-tolerance support using disc programming model

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

DISC is a newly proposed parallel programming paradigm that models many classes of iterative scientific applications through specification of a domain and interactions among domain elements. Accompanied with an associated runtime, it hides the details of inter-process communication and work partitioning (including partitioning in the presence of heterogeneous processing elements) from the programmers. In this paper, we show how these abstractions, particularly the concepts of compute-function and computation-space objects, can be also used to leverage low-overhead fault-tolerance support. While computation-space objects enable automated application level checkpointing, replicated execution of compute-functions helps detect soft errors with low overheads. Experimental results show the effectiveness of the proposed solutions.

Cite

CITATION STYLE

APA

Kurt, M. C., Ren, B., & Agrawal, G. (2016). Low-overhead fault-tolerance support using disc programming model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9519, pp. 20–36). Springer Verlag. https://doi.org/10.1007/978-3-319-29778-1_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free