Automatic generation of high-performance quantized machine learning kernels

37Citations
Citations of this article
56Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Quantization optimizes machine learning inference for resource constrained environments by reducing the precision of its computation. In the extreme, even single-bit computations can produce acceptable results at dramatically lower cost. But this ultra-low-precision quantization is difficult to exploit because extracting optimal performance requires hand-tuning both high-level scheduling decisions and lowlevel implementations. As a result, practitioners settle for a few predefined quantized kernels, sacrificing optimality and restricting their ability to adapt to new hardware. This paper presents a new automated approach to implementing quantized inference for machine learning models. We integrate the choice of how to lay out quantized values into the scheduling phase of a machine learning compiler, allowing it to be optimized in concert with tiling and parallelization decisions. After scheduling, we use program synthesis to automatically generate efficient low-level operator implementations for the desired precision and data layout. We scale up synthesis using a novel reduction sketch that exploits the structure of matrix multiplication. On a ResNet18 model, our generated code outperforms an optimized floating-point baseline by up to 3.9×, and a state-ofthe- art quantized implementation by up to 16.6×.

Cite

CITATION STYLE

APA

Cowan, M., Moreau, T., Chen, T., Bornholt, J., & Ceze, L. (2020). Automatic generation of high-performance quantized machine learning kernels. In CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (pp. 305–316). Association for Computing Machinery, Inc. https://doi.org/10.1145/3368826.3377912

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free