Efficient batch LU and QR decomposition on GPU

William J. Brouwer; Pierre Yves Taunay

Book Chapter

Efficient batch LU and QR decomposition on GPU

Springer International Publishing, (2014), 69-86

DOI: 10.1007/978-3-319-06548-9_4

0Citations

2Readers

Get full text

Abstract

Accelerating applications with Graphics Processing Units (GPUs) is stimulating ongoing interest in diverse fields within science and engineering. The Compute Unified Data Architecture (CUDA) paradigm introduced by Nvidia in 2007 is arguably the most significant driver in the uptake of GPUs for general purpose computation. However, a large array of entry points to GPU acceleration also exists in the form of libraries, explicit applications, as well as compiler directives. An important computational task for many applications is matrix decomposition; this chapter presents new CUDA implementations of key algorithms, specifically LU and QR matrix decomposition for batches of small, dense matrices.

Cite

CITATION STYLE

APA

Brouwer, W. J., & Taunay, P. Y. (2014). Efficient batch LU and QR decomposition on GPU. In Numerical Computations with GPUs (pp. 69–86). Springer International Publishing. https://doi.org/10.1007/978-3-319-06548-9_4

Efficient batch LU and QR decomposition on GPU

Abstract

Cite

Register to see more suggestions