Differentiable Joint Pruning and Quantization for Hardware Efficiency

29Citations
Citations of this article
84Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a differentiable joint pruning and quantization (DJPQ) scheme. We frame neural network compression as a joint gradient-based optimization problem, trading off between model pruning and quantization automatically for hardware efficiency. DJPQ incorporates variational information bottleneck based structured pruning and mixed-bit precision quantization into a single differentiable loss function. In contrast to previous works which consider pruning and quantization separately, our method enables users to find the optimal trade-off between both in a single training procedure. To utilize the method for more efficient hardware inference, we extend DJPQ to integrate structured pruning with power-of-two bit-restricted quantization. We show that DJPQ significantly reduces the number of Bit-Operations (BOPs) for several networks while maintaining the top-1 accuracy of original floating-point models (e.g., 53× BOPs reduction in ResNet18 on ImageNet, 43× in MobileNetV2). Compared to the conventional two-stage approach, which optimizes pruning and quantization independently, our scheme outperforms in terms of both accuracy and BOPs. Even when considering bit-restricted quantization, DJPQ achieves larger compression ratios and better accuracy than the two-stage approach.

Cite

CITATION STYLE

APA

Wang, Y., Lu, Y., & Blankevoort, T. (2020). Differentiable Joint Pruning and Quantization for Hardware Efficiency. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12374 LNCS, pp. 259–277). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58526-6_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free