Current generation of high performance computing platforms tends to hold a large number of cores. Therefore applications have to expose a fine-grain parallelism to be more efficient. Since version 3.0, the OpenMP standard proposes a way to express such parallelism through tasks. Because the task scheduling strategy is implementation defined, each runtime can have a different behavior and efficiency. Notwithstanding, the hierarchical characteristic of current parallel computing systems is rarely considered. This might come down to a loss of performance on large multicore NUMA systems. This paper studies multiple task scheduling algorithms with a configurable scheduler. It relies on a topology-aware tree-based representation of the computing platform to orchestrate the execution and the load-balacing of OpenMP tasks. High-end users can select the task-list granularity according to the tree structure and choose the most convenient work-stealing strategy. One of these strategies takes into account data locality with the help of the hierarchical view. It performs well with unbalanced codes, from BOTS benchmarks, in comparison to Intel and GNU OpenMP runtimes on 16-core and 128-core systems. © 2014 Springer International Publishing Switzerland.
CITATION STYLE
Clet-Ortega, J., Carribault, P., & Pérache, M. (2014). Evaluation of OpenMP task scheduling algorithms for large NUMA architectures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8632 LNCS, pp. 596–607). Springer Verlag. https://doi.org/10.1007/978-3-319-09873-9_50
Mendeley helps you to discover research relevant for your work.