Lightweight Run-TimeWorking Memory Compression for Deployment of Deep Neural Networks on Resource-Constrained MCUs

8Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

Abstract

This work aims to achieve intelligence on embedded devices by deploying deep neural networks (DNNs) onto resource-constrained microcontroller units (MCUs). Apart from the low frequency (e.g., 1-16 MHz) and limited storage (e.g., 16KB to 256KB ROM), one of the largest challenges is the limited RAM (e.g., 2KB to 64KB), which is needed to save the intermediate feature maps of a DNN. Most existing neural network compression algorithms aim to reduce the model size of DNNs so that they can fit into limited storage. However, they do not reduce the size of intermediate feature maps significantly, which is referred to as working memory and might exceed the capacity of RAM. Therefore, it is possible that DNNs cannot run in MCUs even after compression. To address this problem, this work proposes a technique to dynamically prune the activation values of the intermediate output feature maps in the runtime to ensure that they can fit into limited RAM. The results of our experiments show that this method could significantly reduce the working memory of DNNs to satisfy the hard constraint of RAM size, while maintaining satisfactory accuracy with relatively low overhead on memory and run-time latency.

Cite

CITATION STYLE

APA

Wang, Z., Wu, Y., Jia, Z., Shi, Y., & Hu, J. (2021). Lightweight Run-TimeWorking Memory Compression for Deployment of Deep Neural Networks on Resource-Constrained MCUs. In Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC (pp. 607–614). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3394885.3439194

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free