MPI+Threads: Runtime contention and remedies

Abdelhalim Amer; Huiwei Lu; Yanjie Wei; Pavan Balaji; Satoshi Matsuoka

Conference Proceedings

MPI+Threads: Runtime contention and remedies

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP (2015) 2015-January 239-248

DOI: 10.1145/2688500.2688522

40Citations

31Readers

Get full text

Abstract

Hybrid MPI+Threads programming has emerged as an alternative model to the "MPI everywhere" model to better handle the increasing core density in cluster nodes. While the MPI standard allows multithreaded concurrent communication, such flexibility comes with the cost of maintaining thread safety within the MPI implementation, typically implemented using critical sections. In contrast to previous works that studied the importance of critical-section granularity in MPI implementations, in this paper we investigate the implication of critical-section arbitration on communication performance. We first analyze the MPI runtime when multithreaded concurrent communication takes place on hierarchical memory systems. Our results indicate that the mutex-based approach that most MPI implementations use today can incur performance penalties due to unfair arbitration. We then present methods to mitigate these penalties with a first-come, first-served arbitration and a priority locking scheme that favors threads doing useful work. Through evaluations using several benchmarks and applications, we demonstrate up to 5-fold improvement in performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Amer, A., Lu, H., Wei, Y., Balaji, P., & Matsuoka, S. (2015). MPI+Threads: Runtime contention and remedies. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP (Vol. 2015-January, pp. 239–248). Association for Computing Machinery. https://doi.org/10.1145/2688500.2688522

MPI+Threads: Runtime contention and remedies

Abstract

Author supplied keywords

Cite

Register to see more suggestions