Weight compression MAC accelerator for effective inference of deep learning

4Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.

Cite

CITATION STYLE

APA

Maki, A., Miyashita, D., Sasaki, S., Nakata, K., Tachibana, F., Suzuki, T., … Fujimoto, R. (2020). Weight compression MAC accelerator for effective inference of deep learning. IEICE Transactions on Electronics, E103C(10), 514–523. https://doi.org/10.1587/transele.2019CTP0007

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free