Minimal data copy for dense linear algebra factorization

Fred G. Gustavson; John A. Gunnels; James C. Sexton

Conference Proceedings

Minimal data copy for dense linear algebra factorization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4699 LNCS 540-549

DOI: 10.1007/978-3-540-75755-9_66

17Citations

2Readers

Get full text

Abstract

The full format data structures of Dense Linear Algebra hurt the performance of its factorization algorithms. Full format rectangular matrices are the input and output of level the 3 BLAS. It follows that the LAPACK and Level 3 BLAS approach has a basic performance flaw. We describe a new result that shows that representing a matrix A as a collection of square blocks will reduce the amount of data reformating required by dense linear algebra factorization algorithms from O(n3) to O(n2). On an IBM Power3 processor our implementation of Cholesky factorization achieves 92% of peak performance whereas conventional full format LAPACK DPOTRF achieves 77% of peak performance. All programming for our new data structures may be accomplished in standard Fortran, through the use of higher dimensional full format arrays. Thus, new compiler support may not be necessary. We also discuss the role of concatenating submatrices to facilitate hardware streaming. Finally, we discuss a new concept which we call the L1 / L0 cache interface. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Gustavson, F. G., Gunnels, J. A., & Sexton, J. C. (2007). Minimal data copy for dense linear algebra factorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4699 LNCS, pp. 540–549). Springer Verlag. https://doi.org/10.1007/978-3-540-75755-9_66

Minimal data copy for dense linear algebra factorization

Abstract

Cite

Register to see more suggestions