Exploiting communication concurrency on high performance computing systems

2Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Although logically available, applications may not exploit enough instantaneous communication concurrency to maximize hardware utilization on HPC systems. This is exacerbated in hybrid programming models such as SPMD+OpenMP. We present the design of a "multi-threaded" runtime able to transparently increase the instantaneous network concurrency and to provide near saturation bandwidth, independent of the application configuration and dynamic behavior. The runtime forwards communication requests from application level tasks to multiple communication servers. Our techniques alleviate the need for spatial and temporal application level message concurrency optimizations. Experimental results show improved message throughput and bandwidth by as much as 150% for 4KB bytes messages on InfiniBand and by as much as 120% for 4KB byte messages on Cray Aries. For more complex operations such as all-to-all collectives, we observe as much as 30% speedup. This translates into 23% speedup on 12,288 cores for a NAS FT implemented using FFTW. We also observe as much as 76% speedup on 1,500 cores for an already optimized UPC+OpenMP geometric multigrid application using hybrid parallelism.

Cite

CITATION STYLE

APA

Chaimov, N., Ibrahim, K. Z., Williams, S., & Iancu, C. (2015). Exploiting communication concurrency on high performance computing systems. In Proceedings of the 6th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2015 (pp. 132–143). Association for Computing Machinery. https://doi.org/10.1145/2712386.2712394

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free