Domain-specific knowledge distillation yields smaller and better models for conversational commerce

1Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.

Abstract

In the context of conversational commerce, where training data may be limited and low latency is critical, we demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of Sanh et al. (2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.

Cite

CITATION STYLE

APA

Howell, K., Wang, J., Hazare, A., Bradley, J., Brew, C., Chen, X., … Widdows, D. (2022). Domain-specific knowledge distillation yields smaller and better models for conversational commerce. In ECNLP 2022 - 5th Workshop on e-Commerce and NLP, Proceedings of the Workshop (pp. 151–160). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.ecnlp-1.18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free