Abstract
Spatial accelerators provide massive parallelism with an array of homogeneous PEs, and enable efficient data reuse with PE array dataflow and on-chip memory. Many previous works have studied the dataflow architecture of spatial accelerators, including performance analysis and automatic generation. However, existing accelerator generators fail to exploit the entire memory-level reuse opportunities, and generate suboptimal designs with data duplication and inefficient interconnection. In this paper, we propose EMS, an efficient memory subsystem synthesis and optimization framework for spatial accelerators. We first use space-time transformation (STT) to analyze both PE-level and memory-level data reuse. Based on the reuse analysis, we develop an algorithm to automatically generate data layout of the multi-banked scratchpad memory, data mapping, and access controller for the memory. Our generated memory subsystem supports multiple PE-memory interconnection topologies including direct, multicast, and rotated connection. The memory and interconnection generation approach can efficiently utilize the memory-level reuse to avoid duplicated data storage with low hardware cost. EMS can automatically synthesize tensor algebra to hardware designed in Chisel. Experiments show that our proposed memory generator reduces the on-chip memory size by an average of 28% than the state-of-the-art, and achieves comparable hardware performance.
Author supplied keywords
Cite
CITATION STYLE
Jia, L., Wang, Y., Leng, J., & Liang, Y. (2022). EMS: Efficient Memory Subsystem Synthesis for Spatial Accelerators. In Proceedings - Design Automation Conference (pp. 67–72). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3489517.3530411
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.