PRBN: A pipelined implementation of RBN for CNN Training

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently, training CNNs (Convolutional Neural Networks) on-chip has attracted much attention. With the development of the CNNs, the proportion of the BN (Batch Normalization) layer’s execution time is increasing and even exceeds the convolutional layer. The BN layer can accelerate the convergence of training. However, little work focus on the efficient hardware implementation of BN layer computation in training. In this work, we propose an accelerator, PRBN, which supports the BN and convolution computation in training. In our design, a systolic array is used for accelerating the convolution and matrix multiplication in training, and RBN (Range Batch Normalization) array based on hardware-friendly RBN algorithm is implemented for computation of BN layers. We implement PRBN on FPGA PYNQ-Z1. The working frequency of it is 50 MHz and the power of it is 0.346 W. The experimental results show that when compared with CPU i5-7500, PRBN can achieve 3.3$$\times $$ speedup in performance and 8.9$$\times $$ improvement in energy efficiency.

Cite

CITATION STYLE

APA

Yang, Z., Wang, L., Zhang, X., Ding, D., Xie, C., & Luo, L. (2020). PRBN: A pipelined implementation of RBN for CNN Training. In Communications in Computer and Information Science (Vol. 1256 CCIS, pp. 117–131). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-15-8135-9_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free