The paper presents an object detection accelerator featuring a processing-in-memory (PIM) architecture on FPGAs. PIM architectures are well known for their energy efficiency and avoidance of the memory wall. In the accelerator, a PIM unit is developed using BRAM and LUT based counters, which also helps to improve the DSP performance density. The overall architecture consists of 64 PIM units and three memory buffers to store inter-layer results. A shrunk and quantized Tiny-YOLO network is mapped to the PIM accelerator, where DRAM access is fully eliminated during inference. The design achieves a throughput of 201.6 GOPs at 100MHz clock rate and correspondingly, a performance density of 0.57 GOPS/DSP.
CITATION STYLE
Jiao, B., Zhang, J., Xie, Y., Wang, S., Zhu, H., Kang, X., … Chen, C. (2021). A 0.57-GOPS/DSP Object Detection PIM Accelerator on FPGA. In Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC (pp. 13–14). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3394885.3431659
Mendeley helps you to discover research relevant for your work.