Dynamic Neural Network to Enable Run-Time Trade-off between Accuracy and Latency

Li Yang; Deliang Fan

Conference ProceedingsOPEN ACCESS

Dynamic Neural Network to Enable Run-Time Trade-off between Accuracy and Latency

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC (2021) 587-592

DOI: 10.1145/3394885.3431628

2Citations

4Readers

Get full text

Abstract

To deploy powerful deep neural network (DNN) into smart, but resource limited IoT devices, many prior works have been proposed to compress DNN to reduce the network size and computation complexity with negligible accuracy degradation, such as weight quantization, network pruning, convolution decomposition, etc. However, by utilizing conventional DNN compression methods, a smaller, but fixed, network is generated from a relative large background model to achieve resource limited hardware acceleration. However, such optimization lacks the ability to adjust its structure in real-time to adapt for a dynamic computing hardware resource allocation and workloads. In this paper, we mainly review our two prior works [13, 15] to tackle this challenge, discussing how to construct a dynamic DNN by means of either uniform or non-uniform sub-nets generation methods. Moreover, to generate multiple nonuniform sub-nets, [15] needs to fully retrain the background model for each sub-net individually, named as multi-path method. To reduce the training cost, in this work, we further propose a single-path sub-nets generation method that can sample multiple sub-nets in different epochs within one training round. The constructed dynamic DNN, consisting of multiple sub-nets, provides the ability to run-time trade-off the inference accuracy and latency according to hardware resources and environment requirements. In the end, we study the the dynamic DNNs with different sub-nets generation methods on both CIFAR-10 and ImageNet dataset. We also present the run-time tuning of accuracy and latency on both GPU and CPU.

Author supplied keywords

dynamic neural networks

Cite

CITATION STYLE

APA

Yang, L., & Fan, D. (2021). Dynamic Neural Network to Enable Run-Time Trade-off between Accuracy and Latency. In Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC (pp. 587–592). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3394885.3431628

Dynamic Neural Network to Enable Run-Time Trade-off between Accuracy and Latency

Abstract

Author supplied keywords

Cite

Register to see more suggestions