This work aims to achieve intelligence on embedded devices by deploying deep neural networks (DNNs) onto resource-constrained microcontroller units (MCUs). Apart from the low frequency (e.g., 1-16 MHz) and limited storage (e.g., 16KB to 256KB ROM), one of the largest challenges is the limited RAM (e.g., 2KB to 64KB), which is needed to save the intermediate feature maps of a DNN. Most existing neural network compression algorithms aim to reduce the model size of DNNs so that they can fit into limited storage. However, they do not reduce the size of intermediate feature maps significantly, which is referred to as working memory and might exceed the capacity of RAM. Therefore, it is possible that DNNs cannot run in MCUs even after compression. To address this problem, this work proposes a technique to dynamically prune the activation values of the intermediate output feature maps in the runtime to ensure that they can fit into limited RAM. The results of our experiments show that this method could significantly reduce the working memory of DNNs to satisfy the hard constraint of RAM size, while maintaining satisfactory accuracy with relatively low overhead on memory and run-time latency.
CITATION STYLE
Wang, Z., Wu, Y., Jia, Z., Shi, Y., & Hu, J. (2021). Lightweight Run-TimeWorking Memory Compression for Deployment of Deep Neural Networks on Resource-Constrained MCUs. In Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC (pp. 607–614). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3394885.3439194
Mendeley helps you to discover research relevant for your work.