Deriving efficient data movement from decoupled access/execute specifications

25Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.
Get full text

Abstract

On multi-core architectures with software-managed memories, effectively orchestrating data movement is essential to performance, but is tedious and error-prone. In this paper we show that when the programmer can explicitly specify both the memory access pattern and the execution schedule of a computation kernel, the compiler or run-time system can derive efficient data movement, even if analysis of kernel code is difficult or impossible. We have developed a framework of C++ classes for decoupled Access/Execute specifications, allowing for automatic communication optimisations such as software pipelining and data reuse. We demonstrate the ease and efficiency of programming the Cell Broadband Engine architecture using these classes by implementing a set of benchmarks, which exhibit data reuse and non-affine access functions, and by comparing these implementations against alternative implementations, which use hand-written DMA transfers and software-based caching. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Howes, L. W., Lokhmotov, A., Donaldson, A. F., & Kelly, P. H. J. (2009). Deriving efficient data movement from decoupled access/execute specifications. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5409 LNCS, pp. 168–182). https://doi.org/10.1007/978-3-540-92990-1_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free