NVIDIA GPUs scalability to solve multiple (batch) tridiagonal systems implementation of cuThomasBatch

Pedro Valero-Lara; Ivan Martínez-Pérez; Raül Sirvent; Xavier Martorell; Antonio J. Peña

Conference Proceedings

NVIDIA GPUs scalability to solve multiple (batch) tridiagonal systems implementation of cuThomasBatch

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10777 LNCS 243-253

DOI: 10.1007/978-3-319-78024-5_22

9Citations

3Readers

Get full text

Abstract

The solving of tridiagonal systems is one of the most computationally expensive parts in many applications, so that multiple studies have explored the use of NVIDIA GPUs to accelerate such computation. However, these studies have mainly focused on using parallel algorithms to compute such systems, which can efficiently exploit the shared memory and are able to saturate the GPUs capacity with a low number of systems, presenting a poor scalability when dealing with a relatively high number of systems. We propose a new implementation (cuThomasBatch) based on the Thomas algorithm. To achieve a good scalability using this approach is necessary to carry out a transformation in the way that the inputs are stored in memory to exploit coalescence (contiguous threads access to contiguous memory locations). The results given in this study proves that the implementation carried out in this work is able to beat the reference code when dealing with a relatively large number of Tridiagonal systems (2,000–256,000), being closed to 3× (in double precision) and 4× (in single precision) faster using one Kepler NVIDIA GPU.

Author supplied keywords

Cite

CITATION STYLE

APA

Valero-Lara, P., Martínez-Pérez, I., Sirvent, R., Martorell, X., & Peña, A. J. (2018). NVIDIA GPUs scalability to solve multiple (batch) tridiagonal systems implementation of cuThomasBatch. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10777 LNCS, pp. 243–253). Springer Verlag. https://doi.org/10.1007/978-3-319-78024-5_22

NVIDIA GPUs scalability to solve multiple (batch) tridiagonal systems implementation of cuThomasBatch

Abstract

Author supplied keywords

Cite

Register to see more suggestions