CPSAA: Accelerating Sparse Attention Using Crossbar-Based Processing-In-Memory Architecture

Huize Li; Hai Jin; Long Zheng; Xiaofei Liao; Yu Huang; Cong Liu; Jiahong Xu; Zhuohui Duan; Dan Chen; Chuangyi Gui

Journal ArticleOPEN ACCESS

CPSAA: Accelerating Sparse Attention Using Crossbar-Based Processing-In-Memory Architecture

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2024) 43(6) 1741-1754

DOI: 10.1109/TCAD.2023.3344524

1Citations

5Readers

Abstract

The attention-based neural network attracts great interest due to its excellent accuracy enhancement. However, the attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system's performance. To reduce the unnecessary calculations, researchers propose sparse attention to convert some dense-dense matrices multiplication (DDMM) operations to sampled dense-dense matrix multiplication (SDDMM) and sparse matrix multiplication (SpMM) operations. However, current sparse attention solutions introduce massive off-chip random memory access since the sparse attention matrix is generally unstructured. We propose CPSAA, a novel crossbar-based processing-in-memory (PIM)-featured sparse attention accelerator to eliminate off-chip data transmissions. 1) We present a novel attention calculation mode to balance the crossbar writing and crossbar processing latency. 2) We design a novel PIM-based sparsity pruning architecture to eliminate the pruning phase's off-chip data transfers. 3) Finally, we present novel crossbar-based SDDMM and SpMM methods to process unstructured sparse attention matrices by coupling two types of crossbar arrays. Experimental results show that CPSAA has an average of 89.6× , 32.2× , 17.8× , 3.39× , and 3.84× performance improvement and 755.6× , 55.3× , 21.3× , 5.7× , and 4.9× energy-saving when compare with GPU, field programmable gate array, SANGER, ReBERT, and ReTransformer.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, H., Jin, H., Zheng, L., Liao, X., Huang, Y., Liu, C., … Gui, C. (2024). CPSAA: Accelerating Sparse Attention Using Crossbar-Based Processing-In-Memory Architecture. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 43(6), 1741–1754. https://doi.org/10.1109/TCAD.2023.3344524

CPSAA: Accelerating Sparse Attention Using Crossbar-Based Processing-In-Memory Architecture

Abstract

Author supplied keywords

Cite

Register to see more suggestions