Lightweight Run-TimeWorking Memory Compression for Deployment of Deep Neural Networks on Resource-Constrained MCUs

Zhepeng Wang; Yawen Wu; Zhenge Jia; Yiyu Shi; Jingtong Hu

Conference ProceedingsOPEN ACCESS

Lightweight Run-TimeWorking Memory Compression for Deployment of Deep Neural Networks on Resource-Constrained MCUs

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC (2021) 607-614

DOI: 10.1145/3394885.3439194

8Citations

22Readers

Abstract

This work aims to achieve intelligence on embedded devices by deploying deep neural networks (DNNs) onto resource-constrained microcontroller units (MCUs). Apart from the low frequency (e.g., 1-16 MHz) and limited storage (e.g., 16KB to 256KB ROM), one of the largest challenges is the limited RAM (e.g., 2KB to 64KB), which is needed to save the intermediate feature maps of a DNN. Most existing neural network compression algorithms aim to reduce the model size of DNNs so that they can fit into limited storage. However, they do not reduce the size of intermediate feature maps significantly, which is referred to as working memory and might exceed the capacity of RAM. Therefore, it is possible that DNNs cannot run in MCUs even after compression. To address this problem, this work proposes a technique to dynamically prune the activation values of the intermediate output feature maps in the runtime to ensure that they can fit into limited RAM. The results of our experiments show that this method could significantly reduce the working memory of DNNs to satisfy the hard constraint of RAM size, while maintaining satisfactory accuracy with relatively low overhead on memory and run-time latency.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Z., Wu, Y., Jia, Z., Shi, Y., & Hu, J. (2021). Lightweight Run-TimeWorking Memory Compression for Deployment of Deep Neural Networks on Resource-Constrained MCUs. In Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC (pp. 607–614). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3394885.3439194

Lightweight Run-TimeWorking Memory Compression for Deployment of Deep Neural Networks on Resource-Constrained MCUs

Abstract

Author supplied keywords

Cite

Register to see more suggestions