A buffering method for parallelized loop with non-uniform dependencies in high-level synthesis

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently, polyhedral optimization has become focused as a parallelization method for nested loop kernels. However, access conflicts to an off-chip RAM have been the performance bottleneck when applying polyhedral optimization to high-level synthesis. In this paper, we propose a method to accelerate synthesized circuits by buffering off-chip RAM accesses. The buffers are constructed of on-chip RAM blocks that are placed on each of processing elements (PEs) and can be accessed in less cycles than the off-chip RAM. Our method differs from related works in support for non-uniform data dependencies that cannot be represented by constant vectors. The experimental result with practical kernels shows that the buffered circuits with 8 PEs are on average 5.21 times faster than the original ones. © Springer International Publishing Switzerland 2013.

Cite

CITATION STYLE

APA

Suda, A., Takase, H., Takagi, K., & Takagi, N. (2013). A buffering method for parallelized loop with non-uniform dependencies in high-level synthesis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8285 LNCS, pp. 390–401). https://doi.org/10.1007/978-3-319-03859-9_34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free