Impact of compiler phase ordering when targeting GPUs

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Research in compiler pass phase ordering (i.e., selection of compiler analysis/transformation passes and their order of execution) has been mostly performed in the context of CPUs and, in a small number of cases, FPGAs. In this paper we present experiments regarding compiler pass phase ordering specialization of OpenCL kernels targeting NVIDIA GPUs using Clang/LLVM 3.9 and the libclc OpenCL library. More specifically, we analyze the impact of using specialized compiler phase orders on the performance of 15 PolyBench/GPU OpenCL benchmarks. In addition, we analyze the final NVIDIA PTX assembly code generated by the different compilation flows in order to identify the main reasons for the cases with significant performance improvements. Using specialized compiler phase orders, we were able to achieve performance improvements over the CUDA version and OpenCL compiled with the NVIDIA driver. Compared to CUDA, we were able to achieve geometric mean improvements of 1.54× (up to 5.48×). Compared to the OpenCL driver version, we were able to achieve geometric mean improvements of 1.65× (up to 5.70×).

Author supplied keywords

Cite

CITATION STYLE

APA

Nobre, R., Reis, L., & Cardoso, J. M. P. (2018). Impact of compiler phase ordering when targeting GPUs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10659 LNCS, pp. 427–438). Springer Verlag. https://doi.org/10.1007/978-3-319-75178-8_35

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free