Approaching the theoretical performance of heterogeneous multicore architectures, equipped with specialized accelerators, is a challenging issue. Unlike regular CPUs that can transparently access the whole global memory address range, accelerators usually embed local memory on which they perform all their computations using a specific instruction set. While many research efforts have been devoted to offloading parts of a program over such coprocessors, the real challenge is to find a programming model providing a unified view of all available computing units. In this paper, we present an original runtime system providing a high-level, unified execution model allowing seamless execution of tasks over the underlying heterogeneous hardware. The runtime is based on a hierarchical memory management facility and on a codelet scheduler. We demonstrate the efficiency of our solution with a LU decomposition for both homogeneous (3.8 speedup on 4 cores) and heterogeneous machines (95 % efficiency). We also show that a "granularity aware" scheduling can improve execution time by 35%. © Springer-Verlag Berlin Heidelberg 2009.
CITATION STYLE
Augonnet, C., & Namyst, R. (2009). A unified runtime system for heterogeneous multi-core architectures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5415 LNCS, pp. 174–183). https://doi.org/10.1007/978-3-642-00955-6_22
Mendeley helps you to discover research relevant for your work.