Adaptive OpenMP for large NUMA nodes

3Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The advent of multicore processors advocates for a hybrid programming model like MPI+OpenMP. Therefore, OpenMP runtimes require solid performance from a small number of threads (one MPI task per socket, OpenMP inside each socket) to a large number of threads (one MPI task per node, OpenMP inside each node). To tackle this issue, we propose a mechanism to improve performance of thread synchronization with a large spectrum of threads. It relies on a hierarchical tree traversed in a different manner according to the number of threads inside the parallel region. Our approach exposes high performance for thread activation (parallel construct) and thread synchronization (barrier construct). Several papers study hierarchical structures to launch and synchronize OpenMP threads [1, 2]. They tested tree-based approaches to distribute and synchronize threads, but they do not explore mixed hierarchical solutions. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Mahéo, A., Koliaï, S., Carribault, P., Pérache, M., & Jalby, W. (2012). Adaptive OpenMP for large NUMA nodes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7312 LNCS, pp. 254–257). https://doi.org/10.1007/978-3-642-30961-8_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free