BAM! Born-again multi-task networks for natural language understanding

111Citations
Citations of this article
438Readers
Mendeley users who have this article in their library.

Abstract

It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task model surpass its single-task teachers. We evaluate our approach by multi-task fine-tuning BERT on the GLUE benchmark. Our method consistently improves over standard single-task and multi-task training.

Cite

CITATION STYLE

APA

Clark, K., Luong, M. T., Khandelwal, U., Manning, C. D., & Le, Q. V. (2020). BAM! Born-again multi-task networks for natural language understanding. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 5931–5937). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-1595

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free