Optimizing GPU code for CPU execution using openCL and vectorization: A case study on image coding

Pedro M.M. Pereira; Patricio Domingues; Nuno M.M. Rodrigues; Gabriel Falcao; Sergio M.M. de Faria

Conference Proceedings

Optimizing GPU code for CPU execution using openCL and vectorization: A case study on image coding

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 10048 LNCS 537-545

DOI: 10.1007/978-3-319-49583-5_42

0Citations

2Readers

Get full text

Abstract

Although OpenCL aims to achieve portability at the code level, different hardware platforms requires different approaches in order to extract the best performance for OpenCL-based code. In this work, we use an image encoder originally tuned for OpenCL on GPU (OpenCLGPU), and optimize it for multi-CPU based platforms. We produce two OpenCL-based versions: (i) a regular one (OpenCL-CPU) and (ii) a CPU vector-based one (OpenCL-CPU-Vect). The use of CPU vectorization exploits the OpenCL support, making it much simpler than directly coding with SIMD instructions such as SSE and AVX. Globally, while the OpenCL-GPU version is the fastest when run on a high end GPU requiring around 580 s to encode the Lenna image, its performance drops roughly 65% when run unchanged on a multicore CPU machine. For the CPU tuned versions, OpenCL-CPU encodes the Lenna image in 805 s, while the vectorization-based approach executes the same operation in 672 s. Results show that meaningful performance gains can be achieved by tailoring the OpenCL code to the CPU, and that the use of CPU vectorization instructions through OpenCL is both rather simple and performance rewarding.

Author supplied keywords

Cite

CITATION STYLE

APA

Pereira, P. M. M., Domingues, P., Rodrigues, N. M. M., Falcao, G., & de Faria, S. M. M. (2016). Optimizing GPU code for CPU execution using openCL and vectorization: A case study on image coding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10048 LNCS, pp. 537–545). Springer Verlag. https://doi.org/10.1007/978-3-319-49583-5_42

Optimizing GPU code for CPU execution using openCL and vectorization: A case study on image coding

Abstract

Author supplied keywords

Cite

Register to see more suggestions