A framework for enabling openMP autotuning

10Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper describes a lightweight framework that enables autotuning of OpenMP pragmas to ease performance tuning of OpenMP codes across platforms. This paper describes a prototype of the framework and demonstrates its use in identifying best-performing parallel loop schedules and number of threads for five codes from the PolyBench benchmark suite. This process is facilitated by a tool for taking a compact search-space description of pragmas to apply to the loop nest and chooses the best solution using model-based search. This tool offers the potential to achieve performance portability of OpenMP across platforms without burdening the programmer with exploring this search space manually. Performance results show that the tool identifies different selections for schedule and thread count applied to parallel loops across benchmarks, data set sizes and architectures. Performance gain over the baseline with default settings of up to 1.17×, but slowdowns of 0.5× show the importance of preserving default settings. More importantly, this experiment sets the stage for more elaborate experiments to map new OpenMP features such as GPU offloading and the new loop pragma.

Cite

CITATION STYLE

APA

Sreenivasan, V., Javali, R., Hall, M., Balaprakash, P., Scogland, T. R. W., & de Supinski, B. R. (2019). A framework for enabling openMP autotuning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11718 LNCS, pp. 50–60). Springer Verlag. https://doi.org/10.1007/978-3-030-28596-8_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free