Abstract
Overlay architectures are a good way to enable fast development and debug on FPGAs at the expense of potentially limited performance compared to fully customized FPGA designs. When used in concert with hand-tuned FPGA solutions, performant overlay architectures can improve time-to-solution and thus overall productivity of FPGA solutions. This work tunes and specializes FGPU, an open source OpenCL-programmable GPU overlay for FPGAs. We demonstrate that our persistent deep learning (PDL)-FGPU architecture maintains the ease-of-programming and generality of GPU programming while achieving high performance from specialization for the persistent deep learning domain. We also propose an easy method to specialize for other domains. PDL-FGPU includes new instructions, along with micro-architecture and compiler enhancements. We evaluate both the FGPU baseline and the proposed PDL-FGPU on a modern high-end Intel Stratix 10 2800 FPGA in simulation running persistent DL applications (RNN, GRU, LSTM), and non-DL applications to demonstrate generality. PDL-FGPU requires 1.4-3× more ALMs, 4.4-6.4× more M20ks, and 1-9.5× more DSPs than baseline, but improves performance by 56-693× for PDL applications with an average 23.1% degradation on non-PDL applications. We integrated the PDL-FGPU overlay into Intel OPAE to measure real-world performance/power and demonstrate that PDL-FGPU is only 4.0-10.4× slower than the Nvidia V100.
Author supplied keywords
Cite
CITATION STYLE
Ma, R., Hsu, J. C., Tan, T., Nurvitadhi, E., Sheffield, D., Pelt, R., … Chiou, D. (2021). Specializing FGPU for Persistent Deep Learning. ACM Transactions on Reconfigurable Technology and Systems, 14(2). https://doi.org/10.1145/3457886
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.