In recent years, Convolution Neural Networks (CNNs) have been widely applied to some artificial intelligence (AI) systems such as computer vision. Among many existing hardware accelerators, FPGA is regarded as a suitable platform for the implementation of CNNs because of its high energy efficiency and flexible reconfigurability. In this paper, a parameterized design approach is proposed to explore the maximum parallelism that could be possibly implemented in mapping a CNN algorithm onto targeted FPGA resources. Four types of parallelism are employed in our parameterized design to fully exploit the processing resources available in FPGA. Meanwhile, a hardware library consisting of a set of modules is established to accommodate various CNN models. Further, an algorithm is proposed to find the optimal level of parallelism dedicated to a constrained amount of resources. As a case study, the typical LeNet-5 is implemented on Xilinx Zynq7020. Compared with the existing works using the high-level synthesis design flow, our design obtains higher FPS and lower latency under the premise of using fewer LUTs and FFs.
CITATION STYLE
Mao, N., Yang, H., & Huang, Z. (2023). A Parameterized Parallel Design Approach to Efficient Mapping of CNNs onto FPGA. Electronics (Switzerland), 12(5). https://doi.org/10.3390/electronics12051106
Mendeley helps you to discover research relevant for your work.