Iterative stencil computations are present in many scientific and engineering applications. The acceleration of stencil codes using parallel architectures has been widely studied. The parallelization of the stencil computation on FPGA based heterogeneous architectures has been reported with the use of traditional RTL logic design or the use of directives in C/C++ codes on high level synthesis tools. In both cases, it has been shown that FPGAs provide better performance per watt compared to CPU or GPU-based systems. High level synthesis tools are limited to the use of parallelization directives without evaluating other possibilities of their application based on the adaptation of the algorithm. In this document, it is proposed a division of the inner loop of the stencil-based code in such a way that total latency is reduced using memory partition and pipeline directives. As a case study is used the two-dimensional Laplace equation implemented on a ZedBoard and an Ultra96 board using Vivado HLS. The performance is evaluated according to the amount of inner loop divisions and the on-chip memory partitions, in terms of the latency, power consumption, use of FPGA resources, and speed-up.
CITATION STYLE
Castano-Londono, L., Alzate Anzola, C., Marquez-Viloria, D., Gallo, G., & Osorio, G. (2019). Evaluation of Stencil Based Algorithm Parallelization over System-on-Chip FPGA Using a High Level Synthesis Tool. In Communications in Computer and Information Science (Vol. 1052, pp. 52–63). Springer. https://doi.org/10.1007/978-3-030-31019-6_5
Mendeley helps you to discover research relevant for your work.