Auto-tuning neural network quantization framework for collaborative inference between the cloud and edge

Guangli Li; Lei Liu; Xueying Wang; Xiao Dong; Peng Zhao; Xiaobing Feng

Conference Proceedings

Auto-tuning neural network quantization framework for collaborative inference between the cloud and edge

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11139 LNCS 402-411

DOI: 10.1007/978-3-030-01418-6_40

37Citations

51Readers

Get full text

Abstract

Recently, deep neural networks (DNNs) have been widely applied in mobile intelligent applications. The inference for the DNNs is usually performed in the cloud. However, it leads to a large overhead of transmitting data via wireless network. In this paper, we demonstrate the advantages of the cloud-edge collaborative inference with quantization. By analyzing the characteristics of layers in DNNs, an auto-tuning neural network quantization framework for collaborative inference is proposed. We study the effectiveness of mixed-precision collaborative inference of state-of-the-art DNNs by using ImageNet dataset. The experimental results show that our framework can generate reasonable network partitions and reduce the storage on mobile devices with trivial loss of accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, G., Liu, L., Wang, X., Dong, X., Zhao, P., & Feng, X. (2018). Auto-tuning neural network quantization framework for collaborative inference between the cloud and edge. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11139 LNCS, pp. 402–411). Springer Verlag. https://doi.org/10.1007/978-3-030-01418-6_40

Auto-tuning neural network quantization framework for collaborative inference between the cloud and edge

Abstract

Author supplied keywords

Cite

Register to see more suggestions