Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

4Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a convolution and two-stage knowledge distillation to release this constraint. We conducted experiments with VGG and ResNet models, using the CIFAR-100 dataset. With VGG-11 as the teacher's model and VGG-6 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.57% with a compression rate and 3.5x computation rate. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-1 accuracy increased by 4.38% with a compression rate and computation rate. In addition, we conducted experiments using the ImageNet dataset. With MobileNet-16 as the teacher's model and MobileNet-9 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.98% with a compression rate and computation rate.

Cite

CITATION STYLE

APA

Chou, H. H., Chiu, C. T., & Liao, Y. P. (2021). Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network. APSIPA Transactions on Signal and Information Processing, 10, 303–338. https://doi.org/10.1017/ATSIP.2021.16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free