OpenMP 4.0 represents a major upgrade in the language specifications of the standard. Important constructs for the exploitation of simd parallelism, the support for dependencies among tasks and the ability to cancel the operations of a team of threads have been added. What is arguably the most important addition, however, is the introduction of the device model. A variety of computational units, such as gpus, dsps and general or special purpose accelerators are viewed as attached devices, where portion of a unified application code can be offloaded for execution. In this work we present the infrastructure for device support in the ompi research compiler, one of the few compilers that currently implement the new device directives. We discuss the necessary compiler transformations and the general runtime organization. For the first time, special emphasis is placed on the important problem of data environment handling. In addition, we present a prototype implementation on the popular Parallella board which exploits the dual-core arm host processor and the 16-core Epiphany accelerator of the system.
Papadogiannakis, A., Agathos, S. N., & Dimakopoulos, V. V. (2015). OpenMP 4.0 device support in the OMPi compiler. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9342, pp. 202–216). Springer Verlag. https://doi.org/10.1007/978-3-319-24595-9_15