Stencil computations are at the core of applications in many domains such as computational electromagnetics, image processing, and partial differential equation solvers used in a variety of scientific and engineering applications. Short-vector SIMD instruction sets such as SSE and VMX provide a promising and widely available avenue for enhancing performance on modern processors. However a fundamental memory stream alignment issue limits achieved performance with stencil computations on modern short SIMD architectures. In this paper, we propose a novel data layout transformation that avoids the stream alignment conflict, along with a static analysis technique for determining where this transformation is applicable. Significant performance increases are demonstrated for a variety of stencil codes on three modern SIMD-capable processors. © 2011 Springer-Verlag.
CITATION STYLE
Henretty, T., Stock, K., Pouchet, L. N., Franchetti, F., Ramanujam, J., & Sadayappan, P. (2011). Data layout transformation for stencil computations on short-vector SIMD architectures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6601 LNCS, pp. 225–245). https://doi.org/10.1007/978-3-642-19861-8_13
Mendeley helps you to discover research relevant for your work.