In this paper, we propose a feature affinity (FA) assisted knowledge distillation (KD) method to improve quantization-aware training of deep neural networks (DNN). The FA loss on intermediate feature maps of DNNs plays the role of teaching middle steps of a solution to a student instead of only giving final answers in the conventional KD where the loss acts on the network logits at the output level. Combining logit loss and FA loss, we found via convolutional network experiments on CIFAR-10/100, and Tiny ImageNet data sets that the quantized student network receives stronger supervision than from the labeled ground-truth data. The resulting FA quantization-distillation (FAQD), trained to convergence with a cosine annealing scheduler for 200 epochs, is capable of compressing models on label-free data up to or exceeding the accuracy levels of their full precision counterparts, which brings immediate practical benefits as pre-trained teacher models are readily available and unlabeled data are abundant. In contrast, data labeling is often laborious and expensive. Finally, we propose and prove error estimates for a fast feature affinity (FFA) loss function that accurately approximates FA loss at a lower order of computational complexity, which helps speed up training for high resolution image input. Source codes are available at: https://github.com/lzj994/FAQD
CITATION STYLE
Li, Z., Yang, B., Yin, P., Qi, Y., & Xin, J. (2023). Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data. IEEE Access, 11, 78042–78051. https://doi.org/10.1109/ACCESS.2023.3297890
Mendeley helps you to discover research relevant for your work.