In this study we demonstrate the viability of deploying BERT-style models to serverless environments in a production setting. Since the freely available pre-trained models are too large to be deployed in this way, we utilize knowledge distillation and fine-tune the models on proprietary datasets for two real-world tasks: sentiment analysis and semantic textual similarity. As a result, we obtain models that are tuned for a specific domain and deployable in serverless environments. The subsequent performance analysis shows that this solution results in latency levels acceptable for production use and that it is also a cost-effective approach for small-to-medium size deployments of BERT models, all without any infrastructure overhead.
CITATION STYLE
Benešová, K., Švec, A., & Šuppa, M. (2021). Cost-effective Deployment of BERT Models in a Serverless Environment. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Industry Papers (pp. 187–195). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-industry.24
Mendeley helps you to discover research relevant for your work.