The diamondtetris algorithm for maximum performance vectorized stencil computation

Vadim Levchenko; Anastasia Perepelkina

Conference Proceedings

The diamondtetris algorithm for maximum performance vectorized stencil computation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10421 LNCS 124-135

DOI: 10.1007/978-3-319-62932-2_11

3Citations

3Readers

Get full text

Abstract

An algorithm from the LRnLA family, DiamondTetris, for stencil computation is constructed. It is aimed for Many-Integrated-Core processors of the Xeon Phi family. The algorithm and its implementation is described for the wave equation based simulation. Its strong points are locality, efficient use of memory hierarchy, and, most importantly, seamless vectorization. Specifically, only 1 vector rearrange operation is necessary per cell value update. The performance is estimated with the roofline model. The algorithm is implemented in code and tested on Xeon and Xeon Phi machines.

Cite

CITATION STYLE

APA

Levchenko, V., & Perepelkina, A. (2017). The diamondtetris algorithm for maximum performance vectorized stencil computation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10421 LNCS, pp. 124–135). Springer Verlag. https://doi.org/10.1007/978-3-319-62932-2_11

The diamondtetris algorithm for maximum performance vectorized stencil computation

Abstract

Cite

Register to see more suggestions