A Survey on Model Compression and Acceleration for Pretrained Language Models

22Citations
Citations of this article
78Readers
Mendeley users who have this article in their library.

Abstract

Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long inference delay prevent Transformer-based pretrained language models (PLMs) from seeing broader adoption including for edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model training and inference. In this survey, we focus on the inference stage and review the current state of model compression and acceleration for pretrained language models, including benchmarks, metrics and methodology.

Cite

CITATION STYLE

APA

Xu, C., & McAuley, J. (2023). A Survey on Model Compression and Acceleration for Pretrained Language Models. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (Vol. 37, pp. 10566–10575). AAAI Press. https://doi.org/10.1609/aaai.v37i9.26255

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free