An Energy-Efficient Implementation of Group Pruned CNNs on FPGA

10Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In recent years, convolutional neural network (CNN)-based artificial intelligence algorithms have been widely applied to object recognition and image classification tasks. However, the high performance of convolutional neural networks comes at the cost of high-intensity computing and enormous numbers of parameters, which pose substantial challenges to terminal implementations. An end-to-end FPGA-based accelerator is proposed in this work that efficiently processes fine-grained pruned CNNs. A group pruning algorithm with group sparse regularization (GSR) is introduced to solve internal buffer misalignments and load imbalances of the accelerator after fine-grained pruning. A mathematical model of accelerator access and transmission is established to explore the optimal design scale and calculation mode. The accelerator is optimized by designing sparse processing elements and by scheduling the on-and off-chip buffers. The proposed approach reduces the computation of a state-of-the-art large-scale CNN, VGG16, by 86.9% with an accuracy loss on CIFAR-10 of only 0.48%. The accelerator achieves 188.41 GOPS at 100 MHz and consumes 8.15 W when implemented on a Xilinx VC707, making it more energy-efficient than previous approaches.

Cite

CITATION STYLE

APA

Pang, W., Wu, C., & Lu, S. (2020). An Energy-Efficient Implementation of Group Pruned CNNs on FPGA. IEEE Access, 8, 217033–217044. https://doi.org/10.1109/ACCESS.2020.3041464

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free