Optimizing GPU code for CPU execution using openCL and vectorization: A case study on image coding

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Although OpenCL aims to achieve portability at the code level, different hardware platforms requires different approaches in order to extract the best performance for OpenCL-based code. In this work, we use an image encoder originally tuned for OpenCL on GPU (OpenCLGPU), and optimize it for multi-CPU based platforms. We produce two OpenCL-based versions: (i) a regular one (OpenCL-CPU) and (ii) a CPU vector-based one (OpenCL-CPU-Vect). The use of CPU vectorization exploits the OpenCL support, making it much simpler than directly coding with SIMD instructions such as SSE and AVX. Globally, while the OpenCL-GPU version is the fastest when run on a high end GPU requiring around 580 s to encode the Lenna image, its performance drops roughly 65% when run unchanged on a multicore CPU machine. For the CPU tuned versions, OpenCL-CPU encodes the Lenna image in 805 s, while the vectorization-based approach executes the same operation in 672 s. Results show that meaningful performance gains can be achieved by tailoring the OpenCL code to the CPU, and that the use of CPU vectorization instructions through OpenCL is both rather simple and performance rewarding.

Cite

CITATION STYLE

APA

Pereira, P. M. M., Domingues, P., Rodrigues, N. M. M., Falcao, G., & de Faria, S. M. M. (2016). Optimizing GPU code for CPU execution using openCL and vectorization: A case study on image coding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10048 LNCS, pp. 537–545). Springer Verlag. https://doi.org/10.1007/978-3-319-49583-5_42

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free