Deep Generative Knowledge Distillation by Likelihood Finetuning

Jingru Li; Xiaofeng Chen; Peiyu Zheng; Qiang Wang; Zhi Yu

Journal ArticleOPEN ACCESS

Deep Generative Knowledge Distillation by Likelihood Finetuning

IEEE Access (2023) 11 46441-46453

DOI: 10.1109/ACCESS.2023.3273952

4Citations

10Readers

Abstract

Knowledge Distillation (KD) is designed to train smaller student models using a larger pretrained teacher model. However, in decentralized data systems such as blockchain, privacy concerns may arise, making the data inaccessible. To address this issue, Data-Free KD (DFKD) methods have been proposed, which extract prior knowledge from teacher networks and use it to synthesize data for KD. Previous DFKD methods faced challenges due to the large search space of data generation. Recently, deep generative models (DGMs) have been proposed to learn data distribution using deep networks, which provides an efficient way to reduce the search space by generating a set of pseudo data. In this paper, we explore the performance of KD trained using pseudo samples generated by pretrained DGMs and find that the correlation with image quality is not always positive. Based on this observation, we propose a new DFKD framework called Generative Knowledge Distillation (GenKD) that reduces the search space by constructing a prior distribution modeled by DGMs for their power of likelihood estimation. Specifically, we use energy-based models (EBM) to generate data from the Maximum Likelihood Estimation (MLE) of the EBM and gradients from downstream KD tasks by policy gradient. We then train the student model using the pretrained teacher model and pseudo samples. We also implement our GenKD framework on several widely-used benchmarks, including CIFAR100, CIFAR10, and SVHN. Our experiments demonstrate that we can generate high-quality pseudo samples quantitatively and qualitatively using GenKD. Additionally, the top-1 accuracy of the student network can approach state-of-the-art (SOTA) DFKD methods trained using fewer pseudo samples and less generation time.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, J., Chen, X., Zheng, P., Wang, Q., & Yu, Z. (2023). Deep Generative Knowledge Distillation by Likelihood Finetuning. IEEE Access, 11, 46441–46453. https://doi.org/10.1109/ACCESS.2023.3273952

Deep Generative Knowledge Distillation by Likelihood Finetuning

Abstract

Author supplied keywords

Cite

Register to see more suggestions