Effective memory hierarchy utilization is critical to the performance of modern multiprocessor architectures. We have developed the first compiler system that fully automatically parallelizes sequential programs and changes the original array layouts to improve memory system performance. Our optimization algorithm consists of two steps. The first step chooses the parallelization and computation assignment such that synchronization and data sharing are minimized. The second step then restructures the layout of the data in the shared address space with an algorithm that is based on a new data transformation framework. We ran our compiler on a set of application programs and measured their performance on the Stanford DASH multiprocessor. Our results show that the compiler can effectively optimize parallelism in conjunction with memory subsystem performance. © 1995, ACM. All rights reserved.
CITATION STYLE
Anderson, J. M., Amarasinghe, S. P., & Lam, M. S. (1995). Data and Computation Transformations for Multiprocessors. ACM SIGPLAN Notices, 30(8), 166–178. https://doi.org/10.1145/209937.209954
Mendeley helps you to discover research relevant for your work.