Abstract
Nowadays shared memory HPC platforms expose a large number of cores organized in a hierarchical way. Parallel application programmers struggle to express more and more fine-grain parallelism and to ensure locality on such NUMA platforms. Independent loops stand as a natural source of parallelism. Parallel environments like OpenMP provide ways of parallelizing them efficiently, but the achieved performance is closely related to the choice of parameters like the granularity of work or the loop scheduler. Considering that both can depend on the target computer, the input data and the loop workload, the application programmer most of the time fails at designing both portable and efficient implementations. We propose in this paper a new OpenMP loop scheduler, called adaptive, that dynamically adapts the granularity of work considering the underlying system state. Our scheduler is able to perform dynamic load balancing while taking memory affinity into account on NUMA architectures. Results show that adaptive outperforms state-of-the-art OpenMP loop schedulers on memory-bound irregular applications, while obtaining performance comparable to static on parallel loops with a regular workload. © 2013 Springer-Verlag.
Author supplied keywords
Cite
CITATION STYLE
Durand, M., Broquedis, F., Gautier, T., & Raffin, B. (2013). An efficient OpenMP loop scheduler for irregular applications on large-scale NUMA machines. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8122 LNCS, pp. 141–155). https://doi.org/10.1007/978-3-642-40698-0_11
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.