Benchmarking the NVIDIA V100 GPU and tensor cores

16Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The V100 GPU is the newest server-grade GPU produced by NVIDIA and introduces a number of new hardware and API features. This paper details the results of benchmarking the V100 GPU and demonstrates that it is a significant generational improvement, increasing memory bandwidth, cache bandwidth, and reducing latency. A major new addition is the Tensor core units, which have been marketed as deep learning acceleration features that enable the computation of a 4 × 4 × 4 half precision matrix-multiply-accumulate operation in a single clock cycle. This paper confirms that the Tensor cores offer considerable performance gains for half precision general matrix multiplication; however, programming them requires fine control of the memory hierarchy that is typically unnecessary for other applications.

Cite

CITATION STYLE

APA

Martineau, M., Atkinson, P., & McIntosh-Smith, S. (2019). Benchmarking the NVIDIA V100 GPU and tensor cores. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11339 LNCS, pp. 444–455). Springer Verlag. https://doi.org/10.1007/978-3-030-10549-5_35

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free