The automatic speech recognition (ASR) system is becoming increasingly irreplaceable in smart speech interaction applications. Nonetheless, these applications confront the memory wall when embedded in the energy and memory constrained Internet of Things devices. Therefore, it is extremely challenging but imperative to design a memory-saving and energy-saving ASR system. This paper proposes a joint-optimized scheme of network compression with approximate memory for the economical ASR system. At the algorithm level, this work presents block-based pruning and quantization with error model (BPQE), an optimized compression framework including a novel pruning technique coordinated with low-precision quantization and the approximate memory scheme. The BPQE compressed recurrent neural network (RNN) model comes with an ultra-high compression rate and finegrained structured pattern that reduce the amount of memory access immensely. At the hardware level, this work presents an ASR-adapted incremental retraining method to further obtain optimal power saving. This retraining method stimulates the utility of the approximate memory scheme, while maintaining considerable accuracy. According to the experiment results, the proposed joint-optimized scheme achieves 58.6% power saving and 40 memory saving with a phone error rate of 20%.
CITATION STYLE
Li, Q., Dong, P., Yu, Z., Liu, C., Qiao, F., Wang, Y., & Yang, H. (2021). Puncturing the memory wall: Joint optimization of network compression with approximate memory for ASR application. In Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC (pp. 505–511). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3394885.3431512
Mendeley helps you to discover research relevant for your work.