Data-distortion guided self-distillation for deep neural networks

Ting Bing Xu; Cheng Lin Liu

Conference ProceedingsOPEN ACCESS

Data-distortion guided self-distillation for deep neural networks

33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (2019) 5565-5572

DOI: 10.1609/aaai.v33i01.33015565

142Citations

98Readers

Abstract

Knowledge distillation is an effective technique that has been widely used for transferring knowledge from a network to another network. Despite its effective improvement of network performance, the dependence of accompanying assistive models complicates the training process of single network in the need of large memory and time cost. In this paper, we design a more elegant self-distillation mechanism to transfer knowledge between different distorted versions of same training data without the reliance on accompanying models. Specifically, the potential capacity of single network is excavated by learning consistent global feature distributions and posterior distributions (class probabilities) across these distorted versions of data. Extensive experiments on multiple datasets (i.e., CIFAR-10/100 and ImageNet) demonstrate that the proposed method can effectively improve the generalization performance of various network architectures (such as AlexNet, ResNet, Wide ResNet, and DenseNet), outperform existing distillation methods with little extra training efforts.

Cite

CITATION STYLE

APA

Xu, T. B., & Liu, C. L. (2019). Data-distortion guided self-distillation for deep neural networks. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (pp. 5565–5572). AAAI Press. https://doi.org/10.1609/aaai.v33i01.33015565

Data-distortion guided self-distillation for deep neural networks

Abstract

Cite

Register to see more suggestions