Prune Your Model Before Distill It

11Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Knowledge distillation transfers the knowledge from a cumbersome teacher to a small student. Recent results suggest that the student-friendly teacher is more appropriate to distill since it provides more transferrable knowledge. In this work, we propose the novel framework, “prune, then distill,” that prunes the model first to make it more transferrable and then distill it to the student. We provide several exploratory examples where the pruned teacher teaches better than the original unpruned networks. We further show theoretically that the pruned teacher plays the role of regularizer in distillation, which reduces the generalization error. Based on this result, we propose a novel neural network compression scheme where the student network is formed based on the pruned teacher and then apply the “prune, then distill” strategy. The code is available at https://github.com/ososos888/prune-then-distill.

Cite

CITATION STYLE

APA

Park, J., & No, A. (2022). Prune Your Model Before Distill It. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13671 LNCS, pp. 120–136). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20083-0_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free