Performance modeling of gyrokinetic toroidal simulations for a many-tasking runtime system

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Conventional programming practices on multicore processors in high performance computing architectures are not universally effective in terms of efficiency and scalability for many algorithms in scientific computing. One possible solution for improving efficiency and scalability in applications on this class of machines is the use of a many-tasking runtime system employing many lightweight, concurrent threads. Yet a priori estimation of the potential performance and scalability impact of such runtime systems on existing applications developed around the bulk synchronous parallel (BSP) model is not well understood. In this work, we present a case study of a BSP particle-in-cell benchmark code which has been ported to a many-tasking runtime system. The 3-D Gyrokinetic Toroidal code (GTC) is examined in its original MPI form and compared with a port to the High Performance ParalleX 3 (HPX-3) runtime system. Phase overlap, oversubscription behavior, and work rebalancing in the implementation are explored. Results for GTC using the SST/macro simulator complement the implementation results. Finally, an analytic performance model for GTC is presented in order to guide future implementation efforts.

Cite

CITATION STYLE

APA

Anderson, M., Brodowicz, M., Kulkarni, A., & Sterling, T. (2014). Performance modeling of gyrokinetic toroidal simulations for a many-tasking runtime system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8551, pp. 136–157). Springer Verlag. https://doi.org/10.1007/978-3-319-10214-6_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free