During the last years Field Programmable Gate Arrays and Graphics Processing Units have become increasingly important for high-performance computing. In particular, a number of industrial solutions and academic projects are proposing design frameworks based on FPGA-implemented GPU-like compute units. Existing GPU-like core projects provide limited hardware support for shared scratchpad memory and particularly for the problem of bank conflicts, a major source of performance loss with many parallel kernels. In this paper, we present a configurable, GPU-like oriented scratchpad memory with built-in support for bank remapping. The core is fully synthetizable on FPGA with a contained hardware cost. We also validated the presented architecture with a cycle-accurate event-driven emulator written in C++ as well as an RTL simulator tool. Last, we demonstrated the impact of bank remapping and other parameters available with the proposed configurable shared scratchpad memory by evaluating the performance of two real-world parallelized kernels.
CITATION STYLE
Cilardo, A., Gagliardi, M., & Donnarumma, C. (2017). A configurable shared scratchpad memory for GPU-like processors. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 1, pp. 3–14). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-49109-7_1
Mendeley helps you to discover research relevant for your work.