Knowledge distillation and data augmentation for NLP light pre-trained models

Hanwen Luo; Yudong Li; Xiaodong Wang; Yuqing Zhang

Conference ProceedingsOPEN ACCESS

Knowledge distillation and data augmentation for NLP light pre-trained models

Journal of Physics: Conference Series (2020) 1651(1)

DOI: 10.1088/1742-6596/1651/1/012043

4Citations

7Readers

Abstract

Model lightweight aims to solve the problems of various large models for slow training and high resource requirement. Knowledge distillation can be a good solution to these problems. We built a lightweight model that meet the competition requirements and have prominent NLP capabilities. RoBERTa-tiny-clue was used as our backbone model. We tested the effect of soft labels and hard labels on knowledge distillation, made knowledge distillation, fine-tuned this model to get a lighter model with better performance, and then applied it downstream NLP tasks. We also adopted a series of data augmentation methods to improve the performance of the model on downstream tasks, customized different optimization solutions for four tasks. Based on open-source pre-trained model RoBERTa-tiny-clue and public available datasets, we achieved 15 times smaller and 10 times faster than BERT-base, and 95% of BERT-base performance on downstram NLP tasks. Using suitable data augmentation methods for the trained lightweight model, the performance of the model on various downstream tasks reaches or exceeds BERT-base.

Cite

CITATION STYLE

APA

Luo, H., Li, Y., Wang, X., & Zhang, Y. (2020). Knowledge distillation and data augmentation for NLP light pre-trained models. In Journal of Physics: Conference Series (Vol. 1651). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1651/1/012043

Knowledge distillation and data augmentation for NLP light pre-trained models

Abstract

Cite

Register to see more suggestions