The reduced reliability of next generation exascale systems means that the resiliency properties of a numerical algorithm will become an important factor in both the choice of algorithm, and in its analysis. The multigrid algorithm is the workhorse for the distributed solution of linear systems but little is known about its resiliency properties and convergence behavior in a fault-prone environment. In the current work, we propose a probabilistic model for the effect of faults involving random diagonal matrices. We summarize results of the theoretical analysis of the model for the rate of convergence of fault-prone multigrid methods which show that the standard multigrid method will not be resilient. Finally, we present a modification of the standard multigrid algorithm that will be resilient.
CITATION STYLE
Ainsworth, M., & Glusa, C. (2016). Multigrid at scale? In Lecture Notes in Computational Science and Engineering (Vol. 112, pp. 237–253). Springer Verlag. https://doi.org/10.1007/978-3-319-39929-4_24
Mendeley helps you to discover research relevant for your work.