Abstract
Coarse-Grained Reconfigurable Architecture (CGRA) is a competitive accelerator architecture for computation-intensive loop kernels. Spatial CGRA is a typical CGRA that performs all the operations spatially, demanding high data parallelism. Given the performance limitations of single-bank memory, partitioning original data into multi-bank memory within the spatial CGRA is favored. However, we observe that the mapping result can cause the inter-iteration conflict, thereby invalidating the memory partition scheme.In this paper, we develop a constraint satisfaction problem-based conflict detection approach capable of detecting the conflict in intra- and inter-iterations within a partition scheme. Besides, we formulate access scheduling as a graph coloring problem, which can minimize conflicts and improve performance. Overall, we develop a comprehensive end-to-end framework with architectural and compiler support for efficient data parallelism on the spatial CGRA.Experimental results show that our architecture can achieve 13.79×, 2.35×, and 1.16× average improvement in performance, compared with an in-order RISC-V CPU, a mainstream FPGA, and a state-of-the-art CGRA SoC (FDRA), respectively. Besides, our architecture has 7.72×, 2.44×, and 1.14× average energy efficiency gains against these three architectures. Finally, at the CGRA level, our CGRA can achieve a 1.53× energy efficiency gain over the CASCADE CGRA.
Author supplied keywords
Cite
CITATION STYLE
Dai, Y., Gao, X., Shen, C., Peng, B., Yin, W., Luk, W. S., & Wang, L. (2025). Towards Efficient Data Parallelism on Spatial CGRA via Constraint Satisfaction and Graph Coloring. In Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC (pp. 1023–1030). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3658617.3697544
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.