Instruction level parallelism (ILP) is a generally accepted means to speed up the execution of both scientific and non-scientific programs. Compilation techniques for ILP are in a sense “general-purpose” in that they do not depend on these source program characteristics. In this paper we investigate what can be gained by ILP techniques that are specialized for scientific code in the form of nested loop programs. This regular program form allows us to apply well-known techniques taken from the theory of loop transformation. We present a compilation algorithm based on both standard and non-standard transformations to increase fine-grained parallelism for software pipelining, to reduce communication overhead by integrated functional unit assignment and to minimize memory traffic by maximizing data reusability between adjacent computations. We present first results which show impressive speedups compared to conventionally software-pipelined code. Our investigations are based on the limited connectivity VLIW architectural model which is a realistic (= realizable) VLIW machine made up of multiple clusters with private register files.
CITATION STYLE
Slowik, A., Piepenbrock, G., & Pfahler, P. (1994). Compiling nested loops for limited connectivity VLIWs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 786 LNCS, pp. 143–157). Springer Verlag. https://doi.org/10.1007/3-540-57877-3_10
Mendeley helps you to discover research relevant for your work.