Benchmarking the NVIDIA V100 GPU and tensor cores

Matt Martineau; Patrick Atkinson; Simon McIntosh-Smith

Conference ProceedingsOPEN ACCESS

Benchmarking the NVIDIA V100 GPU and tensor cores

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11339 LNCS 444-455

DOI: 10.1007/978-3-030-10549-5_35

16Citations

22Readers

Abstract

The V100 GPU is the newest server-grade GPU produced by NVIDIA and introduces a number of new hardware and API features. This paper details the results of benchmarking the V100 GPU and demonstrates that it is a significant generational improvement, increasing memory bandwidth, cache bandwidth, and reducing latency. A major new addition is the Tensor core units, which have been marketed as deep learning acceleration features that enable the computation of a 4 × 4 × 4 half precision matrix-multiply-accumulate operation in a single clock cycle. This paper confirms that the Tensor cores offer considerable performance gains for half precision general matrix multiplication; however, programming them requires fine control of the memory hierarchy that is typically unnecessary for other applications.

Cite

CITATION STYLE

APA

Martineau, M., Atkinson, P., & McIntosh-Smith, S. (2019). Benchmarking the NVIDIA V100 GPU and tensor cores. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11339 LNCS, pp. 444–455). Springer Verlag. https://doi.org/10.1007/978-3-030-10549-5_35

Benchmarking the NVIDIA V100 GPU and tensor cores

Abstract

Cite

Register to see more suggestions