Abstract
Model lightweight aims to solve the problems of various large models for slow training and high resource requirement. Knowledge distillation can be a good solution to these problems. We built a lightweight model that meet the competition requirements and have prominent NLP capabilities. RoBERTa-tiny-clue was used as our backbone model. We tested the effect of soft labels and hard labels on knowledge distillation, made knowledge distillation, fine-tuned this model to get a lighter model with better performance, and then applied it downstream NLP tasks. We also adopted a series of data augmentation methods to improve the performance of the model on downstream tasks, customized different optimization solutions for four tasks. Based on open-source pre-trained model RoBERTa-tiny-clue and public available datasets, we achieved 15 times smaller and 10 times faster than BERT-base, and 95% of BERT-base performance on downstram NLP tasks. Using suitable data augmentation methods for the trained lightweight model, the performance of the model on various downstream tasks reaches or exceeds BERT-base.
Cite
CITATION STYLE
Luo, H., Li, Y., Wang, X., & Zhang, Y. (2020). Knowledge distillation and data augmentation for NLP light pre-trained models. In Journal of Physics: Conference Series (Vol. 1651). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1651/1/012043
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.