Sign up & Download
Sign in

Facilitating the search for compositions of program transformations

by Albert Cohen, Marc Sigler, Sylvain Girbal, Olivier Temam, David Parello, Nicolas Vasilache
Program (2005)

Abstract

Static compiler optimizations can hardly cope with the complex run-time behavior and hardware components interplay of modern processor architectures. Multiple architectural phenomena occur and interact simultaneously, which requires the optimizer to combine multiple program transformations. Whether these transformations are selected through static analysis and models, runtime feedback, or both, the underlying infrastructure must have the ability to perform long and complex compositions of program transformations in a flexible manner. Existing compilers are ill-equipped to perform that task because of rigid phase ordering, fragile selection rules using pattern matching, and cumbersome expression of loop transformations on syntax trees. Moreover, iterative optimization emerges as a pragmatic and general means to select an optimization strategy via machine learning and operations research. Searching for the composition of dozens of complex, dependent, parameterized transformations is a challenge for iterative approaches.The purpose of this article is threefold: (1) to facilitate the automatic search for compositions of program transformations, introducing a richer framework which improves on classical polyhedral representations, suitable for iterative optimization on a simpler, structured search space, (2) to illustrate, using several examples, that syntactic code representations close to the operational semantics hamper the composition of transformations, and (3) that complex compositions of transformations can be necessary to achieve significant performance benefits. The proposed framework relies on a unified polyhedral representation of loops and statements. The key is to clearly separate four types of actions associated with program transformations: iteration domain, schedule, data layout and memory access functions modifications. The framework is implemented within the Open64/ORC compiler, aiming for native IA64, AMD64 and IA32 code generation, along with source-to-source optimization of Fortran90, C and C++.

Cite this document (BETA)

Available from portal.acm.org
Page 1
hidden

Facilitating the search for compositions of program transformations

Facilitating the Search for
Compositions of Program Transformations
Albert Cohen 1 Sylvain Girbal 12 David Parello 13
Marc Sigler 1 Olivier Temam 1 Nicolas Vasilache 1
1 ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University, and HiPEAC network 2 CEA LIST, Saclay 3 HP France
Abstract
Static compiler optimizations can hardly cope with the com-
plex run-time behavior and hardware components interplay of mod-
ern processor architectures. Multiple architectural phenomena oc-
cur and interact simultaneously, which requires the optimizer to
combine multiple program transformations. Whether these trans-
formations are selected through static analysis and models, runtime
feedback, or both, the underlying infrastructure must have the abil-
ity to perform long and complex compositions of program transfor-
mations in a flexible manner. Existing compilers are ill-equipped to
perform that task because of rigid phase ordering, fragile selection
rules using pattern matching, and cumbersome expression of loop
transformations on syntax trees. Moreover, iterative optimization
emerges as a pragmatic and general means to select an optimization
strategy via machine learning and operations research. Searching
for the composition of dozens of complex, dependent, parameter-
ized transformations is a challenge for iterative approaches.
The purpose of this article is threefold: (1) to facilitate the
automatic search for compositions of program transformations, in-
troducing a richer framework which improves on classical polyhe-
dral representations, suitable for iterative optimization on a simpler,
structured search space, (2) to illustrate, using several examples,
that syntactic code representations close to the operational seman-
tics hamper the composition of transformations, and (3) that com-
plex compositions of transformations can be necessary to achieve
significant performance benefits. The proposed framework relies
on a unified polyhedral representation of loops and statements. The
key is to clearly separate four types of actions associated with pro-
gram transformations: iteration domain, schedule, data layout and
memory access functions modifications. The framework is imple-
mented within the Open64/ORC compiler, aiming for native IA64,
AMD64 and IA32 code generation, along with source-to-source op-
timization of Fortran90, C and C++.
1 Introduction
Both high-performance and embedded architectures include an in-
creasing number of hardware components with complex runtime
behavior, e.g., cache hierarchies (including write buffers, TLBs,
miss address files, L1 and L2 prefetching. . . ), branch predictors,
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, or republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
ICS’05, June 20–22, Boston, MA, USA.
Copyright c© 2005, ACM 1-59593-167-8/06/2005...$5.00
trace cache, load/store queue speculation, and pipeline replays.
Static compiler optimizations have a hard time coping with such
hardware components and their complex interactions. The issues
are (1) to properly identify the architectural phenomena, and (2)
to perform the appropriate and possibly complex sequence of pro-
gram transformations. For the first issue, iterative optimization
[20, 26, 12] is emerging as a promising solution by proposing to
assist static analysis with runtime information to guide program
transformations. However, for the second issue, iterative optimiza-
tion environments will fare no better than existing compilers on top
of which they are currently implemented. The issue is that multi-
ple architecture phenomena often occur simultaneously and inter-
act together. As a result, multiple carefully combined and crafted
program transformations can be necessary to improve performance
[29, 28]. Whether these program transformations are found using
static analysis or runtime information, the underlying compiler in-
frastructure must have the ability to search for and to effectively
perform the proper sequence of program transformations. Up to
now, this fact has been largely overlooked.
As of today, iterative optimization usually consists in choos-
ing a rather small set of transformations, e.g., cache tiling, unrolling
or array padding, and focusing on finding the best possible transfor-
mation parameters, e.g., tile size or unroll factor [19] using parame-
ter search space techniques. However, complex hardware interplay
cannot be solely addressed through the proper selection of trans-
formations parameters. A recent comparative study of model-based
versus empirical optimizations [36] indicates that many motivations
for iterative optimization are irrelevant when the proper transfor-
mations are not available. O’Boyle et al. [19] and Cooper et al.
[12] have also outlined that the ability to perform long sequences
of composed transformations is key to the emergence of iterative
optimization frameworks.
Clearly, there is a need for a compiler infrastructure that can
apply complex and possibly long compositions of program trans-
formations. Unfortunately, existing compiler infrastructures are ill-
equipped for that task. By imposing phase ordering constraints
[35], current compilers lack the ability to perform long sequences
of transformations. In addition, compilers embed a large collection
of ad-hoc program transformations, but they are syntactic transfor-
mations, i.e., control structures are regenerated after each program
transformation, sometimes making it harder to apply the next trans-
formations, especially when the application of program transforma-
tions relies on pattern-matching techniques.
This article introduces a framework to easily search for and
perform compositions of program transformations; this framework
relies on a unified representation of loops and statements, the foun-
dations of which where presented in [10], improving on classical
polyhedral representations [13, 34, 17, 22, 1, 23]. Using this repre-
sentation, a large array of useful and efficient program transforma-
151
Page 2
hidden
tions (loop fusion, tiling, array forward substitution, statement re-
ordering, software pipelining, array padding, etc.), as well as com-
positions of these transformations, can be expressed as a set of sim-
ple matrix operations. Compared to the few attempts at express-
ing a large array of program transformations within the polyhedral
model, the distinctive asset of our representation lies in the simplic-
ity of the formalism to compose non-unimodular transformations
across long, flexible sequences. Existing formalisms are designed
for black-box optimization [13, 22, 1], and applying a classical loop
transformation within them — as proposed in [34, 17] — requires a
syntactic form of the program to anchor the transformation to exist-
ing statements. Up to now, the easy composition of transformations
was restricted to unimodular transformations [35], with some ex-
tensions to singular transformations [21].
The key to our approach is to clearly separate the four differ-
ent types of actions performed by program transformations: mod-
ification of the iteration domain (loop bounds and strides), modi-
fication of the schedule of each individual statement, modification
of the access functions (array subscripts), and modification of the
data layout (array declarations). This separation makes it possible
to provide a matrix representation for each kind of action, enabling
the easy and independent composition of the different “actions” in-
duced by program transformations, and as a result, enabling the
composition of transformations themselves. Current representa-
tions of program transformations do not clearly separate these four
types of actions; as a result, the implementation of certain compo-
sitions of program transformations can be complicated or even im-
possible. For instance, current implementations of loop fusion must
include loop bounds and array subscript modifications even though
they are only byproducts of a schedule-oriented program transfor-
mation; after applying loop fusion, target loops are often peeled,
increasing code size and making further optimizations more com-
plex. Within our representation, loop fusion is only expressed as a
schedule transformation, and the modifications of the iteration do-
main and access functions are implicitly handled, so that the code
complexity is exactly the same before and after fusion. Similarly,
an iteration domain-oriented transformation like unrolling should
have no impact on the schedule or data layout representations; or
a data layout-oriented transformation like padding should have no
impact on the schedule or iteration domain representations. . .
Moreover, since all program transformations correspond to a
set of matrix operations within our representation, searching for
compositions of transformations is often (though not always) equiv-
alent to testing different values of the matrices parameters, further
facilitating the search for compositions. Besides, with this frame-
work, it should also be possible to find new compositions of trans-
formations for which no static model has yet been developed.
This article is organized as follows. Section 2 illustrates with
a simple example the limitations of syntactic representations for
transformation composition, it presents our polyhedral representa-
tion and how it can circumvent these limitations. Using several
SPEC benchmarks, Section 3 shows that complex compositions can
be necessary to reach high performance, and shows how such com-
positions are easily implemented using our polyhedral representa-
tion. Section 4 briefly describes the implementation of our rep-
resentation, of the associated transformation tool, and of the code
generation technique (in Open64/ORC [27]). Section 5 validates
these tools through the evaluation of a dedicated transformation se-
quence for one benchmark. Section 6 presents related works.
2 A New Polyhedral Program Representation
The purpose of Section 2.1 is to illustrate the limitations of the im-
plementation of program transformations in current compilers, us-
ing a simple example. In Section 2.2, we present our polyhedral
representation, in Section 2.3 how it can alleviate the limitations of
the syntactic representation, in Section 2.4 how it can further facili-
tate the search for compositions of transformations, and Section 2.5
presents normalization rules for the representation.
Generally speaking, the main asset of our polyhedral repre-
sentation is that it is semantics-based, abstracting away many im-
plementation artifacts of syntax-based representations, and allow-
ing the definition of most loop transformations without reference to
any syntactic form of the program.
2.1 Limitations of Syntactic Transformations
In current compilers, after applying a program transformation to a
code section, a new version of the code section is generated within
the syntactic intermediate representation (abstract syntax tree, three
address code, SSA graph...), hence the term syntactic (or syntax-
based) transformations. Note that this behavior is also shared by all
previous matrix- or polyhedra-based frameworks.
Code size and complexity. As a result, after multiple transforma-
tions the code size and complexity can dramatically increase.
for (i=0; i<M; i++)
S1 Z[i] = 0;
for (j=0; j<N; j++)
S2 Z[i] += (A[i][j] + B[j][i]) * X[j];
for (k=0; k<P; k++)
for (l=0; l<Q; l++)
S3 Z[k] += A[k][l] * Y[l];
Figure 1. Introductory example
...
if ((M >= P+1) && (N == Q) && (P >= 63))
for (ii=0; ii<P-63; ii+=64)
for (jj=0; jj<Q; jj+=64)
for (i=ii; i<ii+63; i++)
for (j=jj; j<min(Q,jj+63); j++)
Z[i] += (A[i][j] + B[j][i]) * X[j];
Z[i] += A[i][j] * Y[j];
for (ii=P-62; ii<P; ii+=64)
for (jj=0; jj<Q; jj+=64)
for (i=ii; i<P; i++)
for (j=jj; j<min(Q,jj+63); j++)
Z[i] += (A[i][j] + B[j][i]) * X[j];
Z[i] += A[i][j] * Y[j];
for (i=P+1; i<min(ii+63,M); i++)
for (j=jj; j<min(N,jj+63); j++)
Z[i] += (A[i][j] + B[j][i]) * X[j];
for (ii=P+1; ii<M; ii+=64)
for (jj=0; jj<N; jj+=64)
for (i=ii; i<min(ii+63,M); i++)
for (j=jj; j<min(N,jj+63); j++)
Z[i] += (A[i][j] + B[j][i]) * X[j];
...
Figure 2. Versioning after outer loop fusion
Consider the simple synthetic example of Figure 1, where it
is profitable to merge loops i,k (the new loop is named i), and then
loops j, l (the new loop is named j), to reduce the locality distance
of array A, and then to tile loops i and j to exploit the spatial and
TLB locality of array B, which is accessed column-wise. In order
to perform all these transformations, the following actions are nec-
essary: merge loops i, k, then merge loops j, l, then split statement
Z[i]=0 outside the i loop to enable tiling, then strip-mine loop j,
then strip-mine loop i and then interchange i and jj (the loop gener-
ated from the strip-mining of j).
Because the i and j loops have different bounds, the merg-
ing and strip-mining steps will progressively multiply the num-
ber of loop nests versions, each with a different guard. After all
these transformations, the program contains multiple instances of
the code section shown in Figure 2. The number of program state-
ments after each step is indicated in Figure 3.
The final code generated by the polyhedral representation will
be similarly complicated, but this complexity does not show until
152

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

7 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
43% Ph.D. Student
 
14% Student (Master)
 
14% Researcher (at a non-Academic Institution)
by Country
 
29% China
 
14% Japan
 
14% Australia