Knowledge distillation has been proven effective when customizing small language models for specific tasks. Here, a corpus as 'textbook' plays an indispensable role, only through which the teacher can teach the student. Prevailing methods adopt a two-stage distillation paradigm: general distillation first with task-agnostic general corpus and task-specific distillation next with augmented task-specific corpus. We argue that such a paradigm may not be optimal. In general distillation, it's extravagant to let the diverse but desultory general knowledge overwhelms the limited model capacity of the student. While in task-specific distillation, the task corpus is usually limited and narrow, preventing the student from learning enough knowledge. To mitigate the issues in the two gapped corpora, we present a better textbook for the student to learn: contextualized corpus that contextualizes task corpus with large-scale general corpus through relevance-based text retrieval. Experimental results on GLUE benchmark demonstrate that contextualized corpus is the better textbook compared with jointly using general corpus and augmented task-specific corpus. Surprisingly, it enables task-specific distillation from scratch without general distillation while maintaining comparable performance, making it more flexible to customize the student model with desired model size under various computation constraints.
CITATION STYLE
Liu, C., Tao, C., Liang, J., Shen, T., Feng, J., Huang, Q., & Zhao, D. (2022). Rethinking Task-Specific Knowledge Distillation: Contextualized Corpus as Better Textbook. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 10652–10658). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.729
Mendeley helps you to discover research relevant for your work.