Using data compression for optimizing fpga-based convolutional neural network accelerators

7Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Convolutional Neural Network (CNN) has been extensively employed in research fields including multimedia recognition, computer version, etc. Various FPGA-based accelerators for deep CNN have been proposed to achieve high energy-efficiency. For some FPGA-based CNN accelerators in embedded systems, such as UAVs, IoT, and wearable devices, their overall performance is greatly bounded by the limited data bandwidth to the on-board DRAM. In this paper, we argue that it is feasible to overcome the bandwidth bottleneck using data compression techniques. We propose an effective roofline model to explore design trade-off between computation logic and data bandwidth after applying data compression techniques to parameters of CNNs. We implement a decompression module and a CNN accelerator on a single Xilinx VC707 FPGA board with two different compression/decompression algorithms as case studies. Under a scenario with limited data bandwidth, the peak performance of our implementation can outperform designs using previous methods by 3.2× in overall performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Guan, Y., Xu, N., Zhang, C., Yuan, Z., & Cong, J. (2017). Using data compression for optimizing fpga-based convolutional neural network accelerators. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10561 LNCS, pp. 14–26). Springer Verlag. https://doi.org/10.1007/978-3-319-67952-5_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free