Sign up & Download
Sign in

Optimizing the Performance of Streaming Numerical Kernels on the IBM Blue Gene / P PowerPC 450

by Tareq M Malas, Aron J Ahmadia, Jed Brown, John A Gunnels, David E Keyes
the International Journal of High Performance Computing Applications (2012)

Abstract

Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution of partial differential equations, represents a challenge despite the regularity of memory access. Sophisticated optimization techniques are required to fully utilize the Central Processing Unit (CPU). We propose a new method for constructing streaming numerical kernels using a high-level assembly synthesis and optimization framework. We describe an implementation of this method in Python targeting the IBM Blue Gene/P supercomputer's PowerPC 450 core. This paper details the high-level design, construction, simulation, verification, and analysis of these kernels utilizing a subset of the CPU's instruction set. We demonstrate the effectiveness of our approach by implementing several three-dimensional stencil kernels over a variety of cached memory scenarios and analyzing the mechanically scheduled variants, including a 27-point stencil achieving a 1.7x speedup over the best previously published results.

Cite this document (BETA)

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

4 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
25% Student (Bachelor)
 
25% Student (Master)
 
25% Ph.D. Student
by Country
 
75% Saudi Arabia
 
25% Brazil