OpenMP implementations must exploit current and upcoming hardware for performance. Overhead must be controlled and kept to a minimum to avoid low performance at scale. Previous work has shown that overheads do not scale favourably in commonly used OpenMP implementations. Focusing on synchronization overhead, this work analyses the overhead of core OpenMP runtime library components for GNU and LLVM compilers, reflecting on the implementation’s source code and algorithms. In addition, this work investigates the implementation’s capability to handle current CPU-internal NUMA structure observed in recent Intel CPUs. Using a custom benchmark designed to expose synchronization overhead of OpenMP regardless of user code, substantial differences between both implementations are observed. In summary, the LLVM implementation can be considered more scalable than the GNU implementation, but the GNU implementation yields lower overhead for lower threadcounts in some occasions. Neither implementation reacts to the system architecture, although the effects of the internal NUMA structure on the overhead can be observed.
CITATION STYLE
Jammer, T., Iwainsky, C., & Bischof, C. (2020). A comparison of the scalability of OpenMP implementations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12247 LNCS, pp. 83–97). Springer. https://doi.org/10.1007/978-3-030-57675-2_6
Mendeley helps you to discover research relevant for your work.