Abstract
On modern many-core CPUs, performance tuning against complex memory subsystems and scalability for parallelism is mandatory to achieve their potential. In this article, we focus on loop tiling, which plays an important role in performance tuning, and develop a novel framework that analytically models the load balance and empirically autotunes unpredictable cache behaviors through iterative polyhedral compilation using LLVM/Polly. From an evaluation on many-core CPUs, we demonstrate that our autotuner achieves a performance superior to those that use conventional static approaches andwell-known autotuning heuristics. Moreover, our autotuner achieves almost the same performance as a brute-force search-based approach.
Author supplied keywords
Cite
CITATION STYLE
Sato, Y., Yuki, T., & Endo, T. (2019). An autotuning framework for scalable execution of tiled code via iterative polyhedral compilation. ACM Transactions on Architecture and Code Optimization, 15(4). https://doi.org/10.1145/3293449
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.