A highly scalable, algorithm-based fault-tolerant solver for gyrokinetic plasma simulations

4Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With future exascale computers expected to have millions of compute units distributed among thousands of nodes, system faults are predicted to become more frequent. Fault tolerance will thus play a key role in HPC at this scale. In this paper we focus on solving the 5-dimensional gyrokinetic Vlasov-Maxwell equations using the application code GENE as it represents a high-dimensional and resource-intensive problem which is a natural candidate for exascale computing. We discuss the Fault-Tolerant Combination Technique, a resilient version of the Combination Technique, a method to increase the discretization resolution of existing PDE solvers. For the first time, we present an efficient, scalable and fault-tolerant implementation of this algorithm for plasma physics simulations based on a manager-worker model and test it under very realistic and pessimistic environments with simulated faults. We show that the Fault-Tolerant Combination Technique - an algorithm-based forward recovery method - can tolerate a large number of faults with a low overhead and at an acceptable loss in accuracy. Our parallel experiments with up to 32k cores show good scalability at a relative parallel efficiency of 93.61%. We conclude that algorithm-based solutions to fault tolerance are attractive for this type of problems.

Cite

CITATION STYLE

APA

Obersteiner, M., Hinojosa, A. P., Heene, M., Bungartz, H. J., & Pflger, D. (2017). A highly scalable, algorithm-based fault-tolerant solver for gyrokinetic plasma simulations. In Proceedings of ScalA 2017: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis. Association for Computing Machinery, Inc. https://doi.org/10.1145/3148226.3148229

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free