A 3D parallel algorithm for QR decomposition

4Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs.

Cite

CITATION STYLE

APA

Ballard, G., Demmel, J., Grigori, L., Jacquelin, M., & Knight, N. (2018). A 3D parallel algorithm for QR decomposition. In Annual ACM Symposium on Parallelism in Algorithms and Architectures (pp. 55–65). Association for Computing Machinery. https://doi.org/10.1145/3210377.3210415

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free