As an answer to the forthcoming heterogeneous multicore and accelerator-based architectures, we have proposed some syntactic extensions to C in the form of C pragmas, based on OpenMP, that make easier for programmers to offload parts of their applications to the auxiliary processors. Offloaded tasks can be made more profitable using a simple blocking strategy. And the runtime system is used to better support computation and communication overlap, while moving data to and from accelerators. In order to prove the feasibility and usefulness of our proposal, we have considered the IBM Cell architecture. The performance of the whole system has been evaluated using HPCC STREAM Triad and several NAS benchmarks. We present their evaluation and a detailed performance breakdown at the level of parallel regions. We also classify the parallel regions according to their suitability to be exploited in accelerators. Overall, our performance is better compared to the results obtained from the IBM compiler for the Cell processor. © 2010 Springer-Verlag.
CITATION STYLE
Ferrer, R., Beltran, V., Gonzàlez, M., Martorell, X., & Ayguadé, E. (2010). Analysis of task offloading for accelerators. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5952 LNCS, pp. 322–336). https://doi.org/10.1007/978-3-642-11515-8_24
Mendeley helps you to discover research relevant for your work.