An Energy-Efficient Implementation of Group Pruned CNNs on FPGA

Wei Pang; Chenglu Wu; Shengli Lu

Journal ArticleOPEN ACCESS

An Energy-Efficient Implementation of Group Pruned CNNs on FPGA

IEEE Access (2020) 8 217033-217044

DOI: 10.1109/ACCESS.2020.3041464

10Citations

16Readers

Abstract

In recent years, convolutional neural network (CNN)-based artificial intelligence algorithms have been widely applied to object recognition and image classification tasks. However, the high performance of convolutional neural networks comes at the cost of high-intensity computing and enormous numbers of parameters, which pose substantial challenges to terminal implementations. An end-to-end FPGA-based accelerator is proposed in this work that efficiently processes fine-grained pruned CNNs. A group pruning algorithm with group sparse regularization (GSR) is introduced to solve internal buffer misalignments and load imbalances of the accelerator after fine-grained pruning. A mathematical model of accelerator access and transmission is established to explore the optimal design scale and calculation mode. The accelerator is optimized by designing sparse processing elements and by scheduling the on-and off-chip buffers. The proposed approach reduces the computation of a state-of-the-art large-scale CNN, VGG16, by 86.9% with an accuracy loss on CIFAR-10 of only 0.48%. The accelerator achieves 188.41 GOPS at 100 MHz and consumes 8.15 W when implemented on a Xilinx VC707, making it more energy-efficient than previous approaches.

Author supplied keywords

Cite

CITATION STYLE

APA

Pang, W., Wu, C., & Lu, S. (2020). An Energy-Efficient Implementation of Group Pruned CNNs on FPGA. IEEE Access, 8, 217033–217044. https://doi.org/10.1109/ACCESS.2020.3041464

An Energy-Efficient Implementation of Group Pruned CNNs on FPGA

Abstract

Author supplied keywords

Cite

Register to see more suggestions