OpenMP is one of the most widely used standards for enabling threadlevel parallelism in high performance computing codes. The recently released version 4.0 of the specification introduces directives that enable application developers to offload portions of the computation to massively-parallel target devices. However, to efficiently utilize these devices, sophisticated performance analysis tools are required. The emerging OpenMP Tools Interface (OMPT) aids the development of portable tools, but currently lacks the support for OpenMP 4.0 target directives. This paper presents a novel approach to measure the performance of applications utilizing OpenMP offloading. It introduces libmpti, an OMPT-based measurement library for Intel MIC target devices. For host-side analysis we extended the OPARI2 instrumenter and prototypically integrated the complete approach into the state-of-the-art tool infrastructure Score-P. We demonstrate the effectiveness of the presented method and implementation with a Conjugate-Gradient (CG) kernel on an Intel Xeon Phi coprocessor. Finally, we visualize the obtained performance data with Vampir.
CITATION STYLE
Dietrich, R., Schmitt, F., Grund, A., & Schmidl, D. (2014). Performance measurement for the openmp 4.0 offloading model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8806, pp. 291–301). Springer Verlag. https://doi.org/10.1007/978-3-319-14313-2_25
Mendeley helps you to discover research relevant for your work.