Efficient batch LU and QR decomposition on GPU

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Accelerating applications with Graphics Processing Units (GPUs) is stimulating ongoing interest in diverse fields within science and engineering. The Compute Unified Data Architecture (CUDA) paradigm introduced by Nvidia in 2007 is arguably the most significant driver in the uptake of GPUs for general purpose computation. However, a large array of entry points to GPU acceleration also exists in the form of libraries, explicit applications, as well as compiler directives. An important computational task for many applications is matrix decomposition; this chapter presents new CUDA implementations of key algorithms, specifically LU and QR matrix decomposition for batches of small, dense matrices.

Cite

CITATION STYLE

APA

Brouwer, W. J., & Taunay, P. Y. (2014). Efficient batch LU and QR decomposition on GPU. In Numerical Computations with GPUs (pp. 69–86). Springer International Publishing. https://doi.org/10.1007/978-3-319-06548-9_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free