Analysis of the tradeoffs between energy and run time for multilevel checkpointing

5Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In high-performance computing, there is a perpetual hunt for performance and scalability. Supercomputers grow larger offering improved computational science throughput. Nevertheless, with an increase in the number of systems’ components and their interactions, the number of failures and the power consumption will increase rapidly. Energy and reliability are among the most challenging issues that need to be addressed for extreme scale computing. We develop analytical models for run time and energy usage for multilevel fault-tolerance schemes. We use these models to study the tradeoff between run time and energy in FTI, a recently developed multilevel checkpoint library, on an IBM Blue Gene/Q. Our results show that energy consumed by FTI is low and the tradeoff between the run time and energy is small. Using the analytical models, we explore the impact of various system-level parameters on run time and energy tradeoffs.

Cite

CITATION STYLE

APA

Balaprakash, P., Gomez, L. A. B., Bouguerra, M. S., Wild, S. M., Cappello, F., & Hovland, P. D. (2015). Analysis of the tradeoffs between energy and run time for multilevel checkpointing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8966, pp. 249–263). Springer Verlag. https://doi.org/10.1007/978-3-319-17248-4_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free