Recently, polyhedral optimization has become focused as a parallelization method for nested loop kernels. However, access conflicts to an off-chip RAM have been the performance bottleneck when applying polyhedral optimization to high-level synthesis. In this paper, we propose a method to accelerate synthesized circuits by buffering off-chip RAM accesses. The buffers are constructed of on-chip RAM blocks that are placed on each of processing elements (PEs) and can be accessed in less cycles than the off-chip RAM. Our method differs from related works in support for non-uniform data dependencies that cannot be represented by constant vectors. The experimental result with practical kernels shows that the buffered circuits with 8 PEs are on average 5.21 times faster than the original ones. © Springer International Publishing Switzerland 2013.
CITATION STYLE
Suda, A., Takase, H., Takagi, K., & Takagi, N. (2013). A buffering method for parallelized loop with non-uniform dependencies in high-level synthesis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8285 LNCS, pp. 390–401). https://doi.org/10.1007/978-3-319-03859-9_34
Mendeley helps you to discover research relevant for your work.