A Two-Teacher Framework for Knowledge Distillation

Xingjian Chen; Jianbo Su; Jun Zhang

Conference Proceedings

A Two-Teacher Framework for Knowledge Distillation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11554 LNCS 58-66

DOI: 10.1007/978-3-030-22796-8_7

12Citations

10Readers

Get full text

Abstract

Knowledge distillation aims at transferring knowledge from a teacher network to a student network. Commonly, the teacher network has high capacity, while the student network is compact and can be deployed to embedded systems. However, existing distillation methods use only one teacher to guide the student network, and there is no guarantee that the knowledge is sufficiently transferred to the student network. Thus, we propose a novel framework to improve the performance of the student network. This framework consists of two teacher networks trained with different strategies, one is trained strictly to guide the student network to learn sophisticated features, and the other is trained loosely to guide the student network to learn general decision based on learned features. We perform extensive experiments on two standard image classification datasets: CIFAR-10 and CIFAR-100. And results demonstrate that the proposed framework can significantly improve the classification accuracy of a student network.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, X., Su, J., & Zhang, J. (2019). A Two-Teacher Framework for Knowledge Distillation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11554 LNCS, pp. 58–66). Springer Verlag. https://doi.org/10.1007/978-3-030-22796-8_7

A Two-Teacher Framework for Knowledge Distillation

Abstract

Author supplied keywords

Cite

Register to see more suggestions