Recent explorations of large-scale pre-trained language models (PLMs) have revealed the power of PLMs with huge amounts of parameters, setting off a wave of training ever-larger PLMs. However, it requires tremendous computational resources to train a large-scale PLM, which may be practically unaffordable. In addition, existing large-scale PLMs are mainly trained from scratch individually, ignoring that many well-trained PLMs are available. To this end, we explore the question how could existing PLMs benefit training large-scale PLMs in future. Specifically, we introduce a pre-training framework named “knowledge inheritance” (KI) and explore how could knowledge distillation serve as auxiliary supervision during pre-training to efficiently learn larger PLMs. Experimental results demonstrate the superiority of KI in training efficiency. We also conduct empirical analyses to explore the effects of teacher PLMs' pre-training settings, including model architecture, pre-training data, etc. Finally, we show that KI could be applied to domain adaptation and knowledge transfer. The implementation is publicly available at https://github.com/thunlp/Knowledge-Inheritance.
CITATION STYLE
Qin, Y., Lin, Y., Yi, J., Zhang, J., Han, X., Zhang, Z., … Zhou, J. (2022). Knowledge Inheritance for Pre-trained Language Models. In NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 3921–3937). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.naacl-main.288
Mendeley helps you to discover research relevant for your work.