Numerous algorithmic optimization techniques have been proposed to alleviate the computational complexity of convolutional neural networks. Given the broad selection of AI accelerators, it is not obvious which approach benefits from which optimization most. The design space includes a large number of deployment settings (batch sizes, power modes, etc.) and unclear measurement methods. This research provides clarity into this design space, leveraging a novel benchmarking approach. We provide a theoretical evaluation of different CNNs and hardware platforms, focusing on understanding the impact of pruning and quantization as primary optimization techniques. We benchmark across a spectrum of FPGA, GPU, TPU, and VLIW processors for systematically pruned and quantized neural networks (ResNet50, GoogLeNetv1, MobileNetv1, a VGG derivative, a multilayer perceptron) over many deployment options, considering power, latency, and throughput at a specific accuracy. Our findings show that channel pruning is most effective and works across most hardware platforms, with speedups directly correlated to the reduction in compute load, while FPGAs benefit the most from quantization. Pruning and quantization are orthogonal, and yield optimal design points when combined. Further in-depth results can be found at our web portal, where we share all experimental data, provide data analytics, and invite the community to contribute.
CITATION STYLE
Blott, M., Fraser, N. J., Gambardella, G., Halder, L., Kath, J., Neveu, Z., … Doyle, L. (2021). Evaluation of Optimized CNNs on Heterogeneous Accelerators Using a Novel Benchmarking Approach. IEEE Transactions on Computers, 70(10), 1654–1669. https://doi.org/10.1109/TC.2020.3022318
Mendeley helps you to discover research relevant for your work.