Sparse matrix-vector multiplication (SpMV) is widely used in many fields and usually dominates the execution time of a task. With large off-chip memory bandwidth, customizable on-chip resources and high-performance float-point operation, FPGA is a potential platform to accelerate SpMV tasks. However, as compressed data formats for SpMV usually introduce irregular memory access while it is also memory-intensive, implementing an SpMV accelerator on FPGA to achieve a high bandwidth utilization (BU) is a challenging work. Existing works either eliminate irregular memory access at the sacrifice of increasing data redundancy or try to locally reduce the port conflicts introduced by irregular memory access, leading to a limited BU improvement. To this end, this paper proposes a high-bandwidth-utilization SpMV accelerator on FPGAs using partial vector duplication, where read-conflict-free vector buffer, writing-conflict-free adder tree, and ping-pong-like accumulator registers are well elaborated. The FPGA implementation results show that the proposed design can achieve an average of 1.10x performance speedup compared to the state-of-the-art work.
CITATION STYLE
Liu, B., & Liu, D. (2023). Towards High-Bandwidth-Utilization SpMV on FPGAs via Partial Vector Duplication. In Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC (pp. 33–38). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3566097.3567839
Mendeley helps you to discover research relevant for your work.