In this paper, we focus on system level-optimizations for automatic parallelization of nested loop on Reconfigurable Accelerators. Specifically, as off-chip bandwidth plays a major role in total performances for such implementations, we propose some partitioning techniques based on loop tiling which can take advantage of the hierarchically structured RA memory systems.
CITATION STYLE
Derrien, S., & Rajopadhye, S. (2001). Loop tiling for reconfigurable accelerators. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2147, pp. 398–408). Springer Verlag. https://doi.org/10.1007/3-540-44687-7_41
Mendeley helps you to discover research relevant for your work.