Automatic generation of high-performance quantized machine learning kernels

Meghan Cowan; Thierry Moreau; Tianqi Chen; James Bornholt; Luis Ceze

Conference ProceedingsOPEN ACCESS

Automatic generation of high-performance quantized machine learning kernels

CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (2020) 305-316

DOI: 10.1145/3368826.3377912

37Citations

56Readers

Get full text

Abstract

Quantization optimizes machine learning inference for resource constrained environments by reducing the precision of its computation. In the extreme, even single-bit computations can produce acceptable results at dramatically lower cost. But this ultra-low-precision quantization is difficult to exploit because extracting optimal performance requires hand-tuning both high-level scheduling decisions and lowlevel implementations. As a result, practitioners settle for a few predefined quantized kernels, sacrificing optimality and restricting their ability to adapt to new hardware. This paper presents a new automated approach to implementing quantized inference for machine learning models. We integrate the choice of how to lay out quantized values into the scheduling phase of a machine learning compiler, allowing it to be optimized in concert with tiling and parallelization decisions. After scheduling, we use program synthesis to automatically generate efficient low-level operator implementations for the desired precision and data layout. We scale up synthesis using a novel reduction sketch that exploits the structure of matrix multiplication. On a ResNet18 model, our generated code outperforms an optimized floating-point baseline by up to 3.9×, and a state-ofthe- art quantized implementation by up to 16.6×.

Author supplied keywords

Cite

CITATION STYLE

APA

Cowan, M., Moreau, T., Chen, T., Bornholt, J., & Ceze, L. (2020). Automatic generation of high-performance quantized machine learning kernels. In CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (pp. 305–316). Association for Computing Machinery, Inc. https://doi.org/10.1145/3368826.3377912

Automatic generation of high-performance quantized machine learning kernels

Abstract

Author supplied keywords

Cite

Register to see more suggestions