Maximizing the scope of a parallel region, which avoids the costs of barriers and of launching additional parallel regions, is among the first recommendations in any optimization guide for OpenMP. While clearly beneficial and easily accomplished for code where regions are visibly contiguous, regions often become contiguous only after compiler optimization or resolution of abstraction layers. This paper explores changes to the OpenMP specification that would allow implementations to merge adjacent parallel regions automatically, including the removal of issues that make the transformation non-conforming and the addition of hints that facilitate the optimization. Beyond simple merging, we explore hints to fuse workshared loops that occur in syntactically distinct parallel regions or to apply nowait to such loops. Our evaluation shows these changes can provide an overall speedup of 2–8× for a microbenchmark, or 6 % for a representative physics application.
Scogland, T. R. W., Gyllenhaal, J., Keasler, J., Hornung, R., & de Supinski, B. R. (2015). Enabling region merging optimizations in OpenMP. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9342, pp. 177–188). Springer Verlag. https://doi.org/10.1007/978-3-319-24595-9_13