Asynchronous many-task runtime systems and MPI+X hybrid parallelism approaches have shown promise for helping manage the increasing complexity of nodes in current and emerging high performance computing (HPC) systems, including those for exascale the increasing architectural diversity of these systems, however, poses challenges for runtimes supporting more homogeneous HPC systems. Performance portability layers (PPL) have shown promise for helping manage this diversity. This paper describes a heterogeneous MPI+PPL task scheduling approach for combining these promising solutions with additional consideration for parallel third party libraries facing similar challenges to help prepare such a runtime for the diverse heterogeneous systems accompanying exascale computing. This approach is demonstrated using a heterogeneous MPI+Kokkos task scheduler and the accompanying portable abstractions [16] implemented in the Uintah Computational Framework, an asynchronous many-task runtime system, with additional consideration for hypre, a parallel third party library. Results are shown for two challenging problems executing workloads representative of typical Uintah applications these results show performance improvements up to 4.4x when using this scheduler and the accompanying portable abstractions [16] to port a previously MPI-Only problem to Kokkos::OpenMP and Kokkos::CUDA to improve complex heterogeneous node use. Good strong-scaling to 1,024 NVIDIA V100 GPUs and 512 IBM POWER9 processor are also shown using MPI+Kokkos::OpenMP+Kokkos::CUDA at scale.
CITATION STYLE
Holmen, J., Sahasrabudhe, D., & Berzins, M. (2021). A Heterogeneous MPI+PPL Task Scheduling Approach for Asynchronous Many-Task Runtime Systems. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3437359.3465581
Mendeley helps you to discover research relevant for your work.