Multiple Teacher Distillation for Robust and Greener Models

Artur Ilichev; Nikita Sorokin; Valentin Malykh; Irina Piontkovskaya

Conference ProceedingsOPEN ACCESS

Multiple Teacher Distillation for Robust and Greener Models

International Conference Recent Advances in Natural Language Processing, RANLP (2021) 601-610

DOI: 10.26615/978-954-452-072-4_068

2Citations

35Readers

Abstract

The language models nowadays are in the center of natural language processing progress. These models are mostly of significant size. There are successful attempts to reduce them, but at least some of these attempts rely on randomness. We propose a novel distillation procedure leveraging on multiple teachers usage which alleviates random seed dependency and makes the models more robust. We show that this procedure applied to TinyBERT and DistilBERT models improves their worst case results up to 2% while keeping almost the same best-case ones. The latter fact keeps true with a constraint on computational time, which is important to lessen the carbon footprint. In addition, we present the results of an application of the proposed procedure to a computer vision model ResNet, which shows that the statement keeps true in this totally different domain.

Cite

CITATION STYLE

APA

Ilichev, A., Sorokin, N., Malykh, V., & Piontkovskaya, I. (2021). Multiple Teacher Distillation for Robust and Greener Models. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 601–610). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_068

Multiple Teacher Distillation for Robust and Greener Models

Abstract

Cite

Register to see more suggestions