Knowledge distillation aims at transferring knowledge from a teacher network to a student network. Commonly, the teacher network has high capacity, while the student network is compact and can be deployed to embedded systems. However, existing distillation methods use only one teacher to guide the student network, and there is no guarantee that the knowledge is sufficiently transferred to the student network. Thus, we propose a novel framework to improve the performance of the student network. This framework consists of two teacher networks trained with different strategies, one is trained strictly to guide the student network to learn sophisticated features, and the other is trained loosely to guide the student network to learn general decision based on learned features. We perform extensive experiments on two standard image classification datasets: CIFAR-10 and CIFAR-100. And results demonstrate that the proposed framework can significantly improve the classification accuracy of a student network.
CITATION STYLE
Chen, X., Su, J., & Zhang, J. (2019). A Two-Teacher Framework for Knowledge Distillation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11554 LNCS, pp. 58–66). Springer Verlag. https://doi.org/10.1007/978-3-030-22796-8_7
Mendeley helps you to discover research relevant for your work.