LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUs

14Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Low-precision computation, which has been widely supported in contemporary hardware, is considered as one of the most effective methods to accelerate convolutional neural networks. However, low-precision computation is not widely used to speed up Winograd, an algorithm for fast convolution computation, due to the numerical error introduced by combining Winograd transformation and quantization. In this paper, we propose a low-precision Winograd convolution approach, LoWino, based on post-training quantization, which employs a linear quantization method in the Winograd domain to reduce the precision loss caused by transformations. Moreover, we present an efficient implementation that integrates well-designed optimization techniques, thereby adequately exploiting the capability of low-precision computation on modern CPUs. We evaluate our approach on Intel Xeon Scalable Processors by leveraging representative convolutional layers in prevailing deep neural networks. Experimental results show that LoWino achieves up to 2.04 × speedup over state-of-the-art implementations in the vendor library while maintaining the accuracy at a reasonable level.

Cite

CITATION STYLE

APA

Li, G., Jia, Z., Feng, X., & Wang, Y. (2021). LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUs. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3472456.3472464

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free