An algorithm from the LRnLA family, DiamondTetris, for stencil computation is constructed. It is aimed for Many-Integrated-Core processors of the Xeon Phi family. The algorithm and its implementation is described for the wave equation based simulation. Its strong points are locality, efficient use of memory hierarchy, and, most importantly, seamless vectorization. Specifically, only 1 vector rearrange operation is necessary per cell value update. The performance is estimated with the roofline model. The algorithm is implemented in code and tested on Xeon and Xeon Phi machines.
CITATION STYLE
Levchenko, V., & Perepelkina, A. (2017). The diamondtetris algorithm for maximum performance vectorized stencil computation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10421 LNCS, pp. 124–135). Springer Verlag. https://doi.org/10.1007/978-3-319-62932-2_11
Mendeley helps you to discover research relevant for your work.