An autotuning framework for scalable execution of tiled code via iterative polyhedral compilation

11Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

On modern many-core CPUs, performance tuning against complex memory subsystems and scalability for parallelism is mandatory to achieve their potential. In this article, we focus on loop tiling, which plays an important role in performance tuning, and develop a novel framework that analytically models the load balance and empirically autotunes unpredictable cache behaviors through iterative polyhedral compilation using LLVM/Polly. From an evaluation on many-core CPUs, we demonstrate that our autotuner achieves a performance superior to those that use conventional static approaches andwell-known autotuning heuristics. Moreover, our autotuner achieves almost the same performance as a brute-force search-based approach.

Cite

CITATION STYLE

APA

Sato, Y., Yuki, T., & Endo, T. (2019). An autotuning framework for scalable execution of tiled code via iterative polyhedral compilation. ACM Transactions on Architecture and Code Optimization, 15(4). https://doi.org/10.1145/3293449

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free