Scaling Analysis of Specialized Tensor Processing Architectures for Deep Learning Models

17Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Specialized tensor processing architectures (TPA) targeted for neural network processing has attracted a lot of attention in recent years. The computing complexity of the algorithmically different components of some deep neural networks (DNNs) was considered with regard to their further use on such TPAs. To demonstrate the crucial difference between TPU and GPU computing architectures, the real computing complexity of various algorithmically different DNNs was estimated by the proposed scaling analysis of time and speedup dependencies of training and inference times as functions of batch and image sizes. The main accent was made on the widely used and algorithmically different DNNs like VGG16, ResNet50, and CapsNet on the cloud-based implementation of TPA (actually Google Cloud TPUv2). The results of performance study were demonstrated by the proposed scaling method for estimation of efficient usage of these DNNs on this infrastructure. The most important and intriguing results are the scale invariant behaviors of time and speedup dependencies which allow us to use the scaling method to predict the running training and inference times on the new specific TPAs without detailed information about their internals even. The scaling dependencies and scaling powers are different for algorithmically different DNNs (VGG16, ResNet50, CapsNet) and for architecturally different computing hardwares (GPU and TPU). These results give the precise estimation of the higher performance (throughput) of TPAs as Google TPUv2 in comparison to GPU for the large number of computations under conditions of low overhead calculations and high utilization of TPU units by means of the large image and batch sizes. In general, the usage of TPAs like Google TPUv2 is quantitatively proved to be promising tool for increasing performance of inference and training stages even, especially in the view of availability of the similar specific TPAs like TCU in Tesla V100 and Titan V provided by NVIDIA, and others.

Cite

CITATION STYLE

APA

Gordienko, Y., Kochura, Y., Taran, V., Gordienko, N., Rokovyi, A., Alienin, O., & Stirenko, S. (2020). Scaling Analysis of Specialized Tensor Processing Architectures for Deep Learning Models. In Studies in Computational Intelligence (Vol. 866, pp. 65–99). Springer Verlag. https://doi.org/10.1007/978-3-030-31756-0_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free