Accelerating applications with Graphics Processing Units (GPUs) is stimulating ongoing interest in diverse fields within science and engineering. The Compute Unified Data Architecture (CUDA) paradigm introduced by Nvidia in 2007 is arguably the most significant driver in the uptake of GPUs for general purpose computation. However, a large array of entry points to GPU acceleration also exists in the form of libraries, explicit applications, as well as compiler directives. An important computational task for many applications is matrix decomposition; this chapter presents new CUDA implementations of key algorithms, specifically LU and QR matrix decomposition for batches of small, dense matrices.
CITATION STYLE
Brouwer, W. J., & Taunay, P. Y. (2014). Efficient batch LU and QR decomposition on GPU. In Numerical Computations with GPUs (pp. 69–86). Springer International Publishing. https://doi.org/10.1007/978-3-319-06548-9_4
Mendeley helps you to discover research relevant for your work.