A model for predicting the optimum checkpoint interval for restart dumps

52Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

As the run time of an application approaches the the mean time to interrupt (MTTI) for the system on which it is running, it becomes necessary to generate intermediate snapshots of the application's run state, known as checkpoint files or restart dumps. In the event of a system failure that halts program execution, these snapshots allow an application to resume computing from the most recently saved intermediate state instead of starting over at the beginning of the calculation. In this paper three models for predicting the optimum compute intervals between restart dumps are discussed. These models are evaluated by comparing their results to a simulation that emulate an application running on a actual system with interrupts. The results will be used to derive a simple method for calculating the optimum restart interval. © Springer-Verlag Berlin Heidelberg 2003.

Cite

CITATION STYLE

APA

Daly, J. (2003). A model for predicting the optimum checkpoint interval for restart dumps. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2660, 3–12. https://doi.org/10.1007/3-540-44864-0_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free