A Two-Teacher Framework for Knowledge Distillation

12Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Knowledge distillation aims at transferring knowledge from a teacher network to a student network. Commonly, the teacher network has high capacity, while the student network is compact and can be deployed to embedded systems. However, existing distillation methods use only one teacher to guide the student network, and there is no guarantee that the knowledge is sufficiently transferred to the student network. Thus, we propose a novel framework to improve the performance of the student network. This framework consists of two teacher networks trained with different strategies, one is trained strictly to guide the student network to learn sophisticated features, and the other is trained loosely to guide the student network to learn general decision based on learned features. We perform extensive experiments on two standard image classification datasets: CIFAR-10 and CIFAR-100. And results demonstrate that the proposed framework can significantly improve the classification accuracy of a student network.

Cite

CITATION STYLE

APA

Chen, X., Su, J., & Zhang, J. (2019). A Two-Teacher Framework for Knowledge Distillation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11554 LNCS, pp. 58–66). Springer Verlag. https://doi.org/10.1007/978-3-030-22796-8_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free