We introduce a systematic analysis in order to fuse CUDA kernels arising in efficient iterative methods for the solution of sparse linear systems. Our procedure characterizes the input and output vectors of these methods, combining this information together with a dependency analysis, in order to decide which kernels to merge. The experiments on a recent NVIDIA “Kepler” GPU report significant gains, especially in energy consumption, for the fused implementations derived from the application of the methodology to three of the most popular Krylov subspace solvers with/without preconditioning.
CITATION STYLE
Aliaga, J. I., Pérez, J., & Quintana-Ortí, E. S. (2015). Systematic fusion of CUDA kernels for iterative sparse linear system solvers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9233, pp. 675–686). Springer Verlag. https://doi.org/10.1007/978-3-662-48096-0_52
Mendeley helps you to discover research relevant for your work.