The V100 GPU is the newest server-grade GPU produced by NVIDIA and introduces a number of new hardware and API features. This paper details the results of benchmarking the V100 GPU and demonstrates that it is a significant generational improvement, increasing memory bandwidth, cache bandwidth, and reducing latency. A major new addition is the Tensor core units, which have been marketed as deep learning acceleration features that enable the computation of a 4 × 4 × 4 half precision matrix-multiply-accumulate operation in a single clock cycle. This paper confirms that the Tensor cores offer considerable performance gains for half precision general matrix multiplication; however, programming them requires fine control of the memory hierarchy that is typically unnecessary for other applications.
CITATION STYLE
Martineau, M., Atkinson, P., & McIntosh-Smith, S. (2019). Benchmarking the NVIDIA V100 GPU and tensor cores. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11339 LNCS, pp. 444–455). Springer Verlag. https://doi.org/10.1007/978-3-030-10549-5_35
Mendeley helps you to discover research relevant for your work.