KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls

14Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Well-crafted libraries deliver much higher performance than code generated by sophisticated application programmers using advanced optimizing compilers. When a code pattern for which a well-tuned library implementation exists is found in the source code of an application, the highest performing solution is to replace the pattern with a call to the library. Idiom-recognition solutions in the past either required pattern matching machinery that was outside of the compilation framework or provided a very brittle solution that would fail even for minor variants in the pattern source code. This article introduces Kernel Find & Replacer (KernelFaRer), an idiom recognizer implemented entirely in the existing LLVM compiler framework. The versatility of KernelFaRer is demonstrated by matching and replacing two linear algebra idioms, general matrix-matrix multiplication (GEMM), and symmetric rank-2k update (SYR2K). Both GEMM and SYR2K are used extensively in scientific computation, and GEMM is also a central building block for deep learning and computer graphics algorithms. The idiom recognition in KernelFaRer is much more robust than alternative solutions, has a much lower compilation overhead, and is fully integrated in the broadly used LLVM compilation tools. KernelFaRer replaces existing GEMM and SYR2K idioms with computations performed by BLAS, Eigen, MKL (Intel's x86), ESSL (IBM's PowerPC), and BLIS (AMD). Gains in performance that reach 2000× over hand-crafted source code compiled at the highest optimization level demonstrate that replacing application code with library call is a performant solution.

Cite

CITATION STYLE

APA

De Carvalho, J. P. L., Kuzma, B., Korostelev, I., Amaral, J. N., Barton, C., Moreira, J., & Araujo, G. (2021). KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls. ACM Transactions on Architecture and Code Optimization, 18(3). https://doi.org/10.1145/3459010

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free