An Intermediate-Centric Dataflow for Transposed Convolution Acceleration on FPGA

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Transposed convolution has been prevailing in convolutional neural networks (CNNs), playing an important role in multiple scenarios such as image segmentation and back-propagation process of training CNNs. This mainly benefits from the ability to up-sample the input feature maps by interpolating new information from the input feature pixels. However, the backward-stencil computation constrains its performance and hindered its wide application in diverse platforms. Moreover, in contrast to the efforts on accelerating the convolution, there is a rare investigation on the acceleration of transposed convolution that is identically compute-intensive as the former.For acceleration of transposed convolution, we propose an intermediate-centric dataflow scheme, in which we decouple the generation of the intermediate patch from its further process, aim at efficiently performing the backward-stencil computation. The intermediate-centric dataflow breaks the transposed convolution into several phases/stages, achieving feeding the input feature maps and performing the backward-stencil computation in a pipelining manner. It also provides four-degree computation parallelism and efficient data reuse of input feature maps/weights. Furthermore, we also theoretically analyze the irregular data dependence leveraging the polyhedral model, which constrains the parallel computing of transposed convolution. Additionally, we devise an optimization problem to explore the design space and automatically generate the optimal design configurations for different transposed convolutional layers and hardware platforms. By selecting the representative transposed convolutional layers from DCGAN, FSRCNN, and FCN, we generate the corresponding accelerator arrays of intermediate-centric dataflow on the Xilinx Alveo U200 platform and reach the performance of 3.92 TOPS, 2.72 TOPS, and 4.76 TOPS, respectively.

Cite

CITATION STYLE

APA

Ma, Z., Dai, T., Wei, X., & Luo, G. (2023). An Intermediate-Centric Dataflow for Transposed Convolution Acceleration on FPGA. ACM Transactions on Embedded Computing Systems, 22(6). https://doi.org/10.1145/3561053

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free