Abstract
An ever-growing diversity in the architecture of modern super-computers has led to challenges in developing scientific software. Utilizing heterogeneous and disruptive architectures (e.g., off-chip and, in the near future, on-chip accelerators) has increased the software complexity and worsened its maintainability. To that end, we need a productive software ecosystem that improves the usability and portability of applications for such systems while allowing every parallelism opportunity to be exploited. In this paper, we outline several challenges that we encountered in the implementation of Gecko, a hierarchical model for distributed shared memory architectures, using a directive-based programming model, and discuss our solutions. Such challenges include: 1) inferred kernel execution with respect to the data placement, 2) workload distribution, 3) hierarchy maintenance, and 4) memory management. We performed the experimental evaluation of our implementation by using the Stream and Rodinia benchmarks. These benchmarks represent several major scientific software applications commonly used by the domain scientists. Our results reveal how the Stream benchmark reaches a sustainable bandwidth of 80 GB/s and 1.8 TB/s for single Intel Xeon Processor and four NVIDIA V100 GPUs, respectively. Additionally, the srad-v2 in the Rodinia benchmark reaches the 88% speedup efficiency while using four GPUs.
Author supplied keywords
Cite
CITATION STYLE
Ghane, M., Chandrasekaran, S., & Cheung, M. S. (2020). Towards a portable hierarchical view of distributed shared memory systems: Challenges and solutions. In Proceedings of the 11th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2020. Association for Computing Machinery, Inc. https://doi.org/10.1145/3380536.3380542
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.