LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs. © 2013 Springer-Verlag.
CITATION STYLE
Kurzak, J., Luszczek, P., Faverge, M., & Dongarra, J. (2013). Programming the LU factorization for a multicore system with accelerators. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7851 LNCS, pp. 28–35). https://doi.org/10.1007/978-3-642-38718-0_6
Mendeley helps you to discover research relevant for your work.