Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

Hsing Hung Chou; Ching Te Chiu; Yi Ping Liao

Journal ArticleOPEN ACCESS

Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

APSIPA Transactions on Signal and Information Processing (2021) 10 303-338

DOI: 10.1017/ATSIP.2021.16

4Citations

6Readers

Abstract

Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a convolution and two-stage knowledge distillation to release this constraint. We conducted experiments with VGG and ResNet models, using the CIFAR-100 dataset. With VGG-11 as the teacher's model and VGG-6 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.57% with a compression rate and 3.5x computation rate. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-1 accuracy increased by 4.38% with a compression rate and computation rate. In addition, we conducted experiments using the ImageNet dataset. With MobileNet-16 as the teacher's model and MobileNet-9 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.98% with a compression rate and computation rate.

Author supplied keywords

Cite

CITATION STYLE

APA

Chou, H. H., Chiu, C. T., & Liao, Y. P. (2021). Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network. APSIPA Transactions on Signal and Information Processing, 10, 303–338. https://doi.org/10.1017/ATSIP.2021.16

Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

Abstract

Author supplied keywords

Cite

Register to see more suggestions