Abstract
Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs.
Author supplied keywords
Cite
CITATION STYLE
Ballard, G., Demmel, J., Grigori, L., Jacquelin, M., & Knight, N. (2018). A 3D parallel algorithm for QR decomposition. In Annual ACM Symposium on Parallelism in Algorithms and Architectures (pp. 55–65). Association for Computing Machinery. https://doi.org/10.1145/3210377.3210415
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.