Deep neural networks have achieved impressive success on various multimedia applications in the past decades. To reach a higher performance on real-world resource-constrained devices with large models that are already learned, knowledge distillation, which aims at transferring representational knowledge from a large teacher network into a small student network, has attracted increasing attention. Recently, contrastive distillation methods have achieved superior performance in this area, due to the powerful representability brought by contrastive/self-supervised learning. These models often transfer knowledge through individual samples or inter-class relationships, while ignoring the correlation lying among intra-class samples, which convey abundant information. In this paper, we propose a Positive pair Aware Contrastive Knowledge Distillation (PACKD) framework to extend the contrastive distillation with more positive pairs to capture more abundant knowledge from the teacher. Specifically, it pulls together features of pairs from the same class learned by the student and teacher while simultaneously pushing apart those of pairs from different classes. With a positive-pair similarity weighting strategy based on optimal transport, the proposed contrastive objective is able to improve the feature discriminability between positive samples with large visual discrepancies. Experiments on different benchmarks demonstrate the effectiveness of the proposed PACKD.
CITATION STYLE
Yu, Z., Xu, Q., Jiang, Y., Qin, H., & Huang, Q. (2022). Pay Attention to Your Positive Pairs: Positive Pair Aware Contrastive Knowledge Distillation. In MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia (pp. 5862–5870). Association for Computing Machinery, Inc. https://doi.org/10.1145/3503161.3548256
Mendeley helps you to discover research relevant for your work.