Deep Generative Knowledge Distillation by Likelihood Finetuning

3Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Knowledge Distillation (KD) is designed to train smaller student models using a larger pretrained teacher model. However, in decentralized data systems such as blockchain, privacy concerns may arise, making the data inaccessible. To address this issue, Data-Free KD (DFKD) methods have been proposed, which extract prior knowledge from teacher networks and use it to synthesize data for KD. Previous DFKD methods faced challenges due to the large search space of data generation. Recently, deep generative models (DGMs) have been proposed to learn data distribution using deep networks, which provides an efficient way to reduce the search space by generating a set of pseudo data. In this paper, we explore the performance of KD trained using pseudo samples generated by pretrained DGMs and find that the correlation with image quality is not always positive. Based on this observation, we propose a new DFKD framework called Generative Knowledge Distillation (GenKD) that reduces the search space by constructing a prior distribution modeled by DGMs for their power of likelihood estimation. Specifically, we use energy-based models (EBM) to generate data from the Maximum Likelihood Estimation (MLE) of the EBM and gradients from downstream KD tasks by policy gradient. We then train the student model using the pretrained teacher model and pseudo samples. We also implement our GenKD framework on several widely-used benchmarks, including CIFAR100, CIFAR10, and SVHN. Our experiments demonstrate that we can generate high-quality pseudo samples quantitatively and qualitatively using GenKD. Additionally, the top-1 accuracy of the student network can approach state-of-the-art (SOTA) DFKD methods trained using fewer pseudo samples and less generation time.

Cite

CITATION STYLE

APA

Li, J., Chen, X., Zheng, P., Wang, Q., & Yu, Z. (2023). Deep Generative Knowledge Distillation by Likelihood Finetuning. IEEE Access, 11, 46441–46453. https://doi.org/10.1109/ACCESS.2023.3273952

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free