When optimizing or porting applications to new architectures, a preliminary characterization is necessary to exploit the maximum computing power of the employed devices. Profiling tools are available for numerous architectures and programming models, making it easier to spot possible bottlenecks. However, for a better interpretation of the collected results, current profilers rely on insightful performance models. In this paper, we describe the Cache Aware Roofline Model (CARM) and tools for its generation to enable the performance characterization of GPU architectures and workloads. We use CARM to characterize two kernels that are part of a 3D iterative reconstruction application for Computed Tomography (CT). These two kernels take most of the execution time of the whole method, being therefore suitable for a deeper analysis. By exploring the model and the methodology proposed, the overall performance of the kernels has been improved up to two times compared to the previous implementations.
CITATION STYLE
Serrano, E., Ilic, A., Sousa, L., Garcia-Blas, J., & Carretero, J. (2018). Cache-Aware Roofline Model and Medical Image Processing Optimizations in GPUs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11203 LNCS, pp. 509–526). Springer Verlag. https://doi.org/10.1007/978-3-030-02465-9_36
Mendeley helps you to discover research relevant for your work.