Efficient primitives for standard tensor linear algebra

David M. Rogers

Conference ProceedingsOPEN ACCESS

Efficient primitives for standard tensor linear algebra

Rogers D

ACM International Conference Proceeding Series (2016) 17-21-July-2016

DOI: 10.1145/2949550.2949580

2Citations

9Readers

Abstract

This paper presents the design and implementation of lowlevel library to compute general sums and products over multi-dimensional arrays (tensors). Using only 3 low-level functions, the API at once generalizes core BLAS1-3 as well as eliminates the need for most tensor transpositions. Despite their relatively low operation count, we show that these transposition steps can become performance limiting in typical use cases for BLAS on tensors. The execution of the present API achieves peak performance on the same order of magnitude as for vendor-optimized GEMM by utilizing a code generator to output CUDA source code for all computational kernels. The outline for these kernels is a multi-dimensional generalization of the MAGMA BLAS matrix multiplication on GPUs. Separate transpositions steps can be skipped because every kernel allows arbitrary multidimensional transpositions of the arguments. The library, including its methodology and programming techniques, are made available in SLACK. Future improvements to the library include a high-level interface to translate directly from a LATEX-like equation syntax to a data-parallel computation.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Rogers, D. M. (2016). Efficient primitives for standard tensor linear algebra. In ACM International Conference Proceeding Series (Vol. 17-21-July-2016). Association for Computing Machinery. https://doi.org/10.1145/2949550.2949580

Readers' Seniority

Researcher 2

40%

Professor / Associate Prof. 1

20%

Lecturer / Post doc 1

20%

PhD / Post grad / Masters / Doc 1

20%

Readers' Discipline

Computer Science 5

71%

Arts and Humanities 1

14%

Chemistry 1

14%

Efficient primitives for standard tensor linear algebra

Abstract

Author supplied keywords

References Powered by Scopus

The NumPy array: A structure for efficient numerical computation

Basic Linear Algebra Subprograms for Fortran Usage

Communication: Tensor hypercontraction. III. Least-squares tensor hypercontraction for the determination of correlated wavefunctions

Cited by Powered by Scopus

Cutensor-tubal: Optimized GPU Library for Low-tubal-rank Tensors

Exascale challenges in numerical linear and multilinear algebras

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline