Automated transformation of GPU-specific OpenCL kernels targeting performance portability on multi-core/many-core CPUs

5Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. When executing GPU-specific kernels on CPUs, local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns by using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by removing all the unwanted local-memory arrays together with the obsolete barrier statements. Experiments show that the automated transformation can satisfactorily improve OpenCL kernel performances on Sandy Bridge CPU and Intel's Many-Integrated-Core coprocessor. © 2014 Springer International Publishing Switzerland.

Cite

CITATION STYLE

APA

Huang, D., Wen, M., Xun, C., Chen, D., Cai, X., Qiao, Y., … Zhang, C. (2014). Automated transformation of GPU-specific OpenCL kernels targeting performance portability on multi-core/many-core CPUs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8632 LNCS, pp. 210–221). Springer Verlag. https://doi.org/10.1007/978-3-319-09873-9_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free