An FPGA-based RNN-T inference accelerator with PIM-HBM

7Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we implemented a world-first RNN-T inference accelerator using FPGA with PIM-HBM that can multiply the internal bandwidth of the memory. The accelerator offloads matrix-vector multiplication (GEMV) operations of LSTM layers in RNN-T into PIM-HBM, and PIM-HBM reduces the execution time of GEMV significantly by exploiting HBM internal bandwidth. To ensure that the memory commands are issued in a pre-defined order, which is one of the most important constraints in exploiting PIM-HBM, we implement a direct memory access (DMA) module and change configuration of the on-chip memory controller by utilizing the flexibility and reconfigurability of the FPGA. In addition, we design the other hardware modules for acceleration such as non-linear functions (i.e., sigmoid and hyperbolic tangent), element-wise operation, and ReLU module, to operate these compute-bound RNN-T operations on FPGA. For this, we prepare FP16 quantized weight and MLPerf input datasets, and modify the PCIe device driver and C++ based control codes. On our evaluation, our accelerator with PIM-HBM reduces the execution time of RNN-T by 2.5 × on average with 11.09% reduced LUT size and improves energy efficiency up to 2.6 × compared to the baseline.

Cite

CITATION STYLE

APA

Kang, S. H., Lee, S., Kim, B., Kim, H., Sohn, K., Kim, N. S., & Lee, E. (2022). An FPGA-based RNN-T inference accelerator with PIM-HBM. In FPGA 2022 - Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 146–152). Association for Computing Machinery, Inc. https://doi.org/10.1145/3490422.3502355

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free