Most previous studies on tiling focus on the division of iteration space. However, on distributed memory parallel systems, the decomposition of computation and the distribution of data must be handled at the same time, in order to attain load balancing and to minimize data migration. In this paper, we formulate a 0-1 integer linear programming for the problem of globally optimal tiling to minimize the total execution time. To simplify the selection of tiling parameters, we restrict the tile shape to semi-oblique shape, and present two effective approaches to decide the tile shape for multi-dimensional semi-oblique shaped tiling. Besides, we present a tile-to-processor mapping scheme based on hyperplanes, which can express diverse parallelism and gain better performance than traditional methods. The experimentations with NPB2.3-serial SP and LU on Qsnet connected cluster achieved the average parallel efficiency of 87% and 73% respectively. © 2008 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Liu, L., Chen, L., Wu, C., & Feng, X. B. (2008). Global tiling for communication minimal parallelization on distributed memory systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5168 LNCS, pp. 382–391). https://doi.org/10.1007/978-3-540-85451-7_41
Mendeley helps you to discover research relevant for your work.