Minimizing completion time for loop tiling with computation and communication overlapping

  • Goumas G
  • Sotiropoulos A
  • Koziris N
  • 12


    Mendeley users who have this article in their library.
  • 28


    Citations of this article.


This paper proposes a new method for the problem of minimizing the
execution time of nested for-loops using a tiling transformation. In our
approach, we are interested not only in tile size and shape according to
the required communication to computation ratio, but also in overall
completion time. We select a time hyperplane to execute different tiles
much more efficiently by exploiting the inherent overlapping between
communication and computation phases among successive, atomic tile
executions. We assign tiles to processors according to the tile space
boundaries thus considering the iteration space bounds. Our schedule
considerably reduces overall completion time under the assumption that
some part from every communication phase can be efficiently overlapped
with atomic, pure tile computations. The overall schedule resembles a
pipelined datapath where computations are not anymore interleaved with
sends and receives to non-local processors. Experimental results in a
cluster of Pentiums by using various MPI send primitives show that the
total completion time is significantly reduced

Author-supplied keywords

  • Loop tiling
  • MPI send-receive primitives
  • communication overlapping
  • hyperplanes
  • supernodes

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • G. Goumas

  • A. Sotiropoulos

  • N. Koziris

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free