Sliding Cross Entropy for Self-Knowledge Distillation

Hanbeen Lee; Jeongho Kim; Simon S. Woo

Conference ProceedingsOPEN ACCESS

Sliding Cross Entropy for Self-Knowledge Distillation

International Conference on Information and Knowledge Management, Proceedings (2022) 1044-1053

DOI: 10.1145/3511808.3557453

3Citations

5Readers

Get full text

Abstract

Knowledge distillation (KD) is a powerful technique for improving the performance of a small model by leveraging the knowledge of a larger model. Despite its remarkable performance boost, KD has a drawback with the substantial computational cost of pre-training larger models in advance. Recently, a method called self-knowledge distillation has emerged to improve the model's performance without any supervision. In this paper, we present a novel plug-in approach called Sliding Cross Entropy (SCE) method, which can be combined with existing self-knowledge distillation to significantly improve the performance. Specifically, to minimize the difference between the output of the model and the soft target obtained by self-distillation, we split each softmax representation by a certain window size, and reduce the distance between sliced parts. Through this approach, the model evenly considers all the inter-class relationships of a soft target during optimization. The extensive experiments show that our approach is effective in various tasks, including classification, object detection, and semantic segmentation. We also demonstrate SCE consistently outperforms existing baseline methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Lee, H., Kim, J., & Woo, S. S. (2022). Sliding Cross Entropy for Self-Knowledge Distillation. In International Conference on Information and Knowledge Management, Proceedings (pp. 1044–1053). Association for Computing Machinery. https://doi.org/10.1145/3511808.3557453

Sliding Cross Entropy for Self-Knowledge Distillation

Abstract

Author supplied keywords

Cite

Register to see more suggestions