Quantization-Aware Training with Dynamic and Static Pruning

8Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The evolution of deep neural networks (DNNs) naturally leads to an increase in model size. This necessitates various model compression techniques, such as pruning and quantization, to reduce memory usage and power consumption. In particular, combining these compression techniques can achieve significant cost savings. However, we found that methods using dynamic pruning and quantization suffer from instability in training and poor generalization performance due to the effects of the two Straight Through Estimators (STE). To address this problem, we propose a Quantization-aware training with Dynamic and Static pruning (QADS) method that takes advantage of both pruning and quantization by performing STE operations only during quantization from a certain point in time. In our experiments, the proposed method exhibits more stable training compared to existing techniques and achieves performance improvements on the CIFAR-10/100, ImageNet, and Google Speech Command datasets. The code is provided at https://github.com/Ahnho/Quantization-aware-training-with-Dynamic-and-Static-Pruning.

Cite

CITATION STYLE

APA

An, S., Shin, J., & Kim, J. (2025). Quantization-Aware Training with Dynamic and Static Pruning. IEEE Access, 13, 57476–57484. https://doi.org/10.1109/ACCESS.2025.3556629

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free