HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Pre-trained language models (LMs) have brought remarkable performance to numerous NLP tasks. However, they require significant resources and entail high computational costs for inference, making it challenging to deploy them in real-world and real-time systems. Existing early exiting methods aim to reduce computational complexity by selecting the layer at which to exit, but suffer from the limitation that they have to sequentially traverse through all layers prior to the selected exit layer, which lacks flexibility and degrades their performance. To solve this problem, we propose a homotopic and adaptive layer skipping fine-tuning method named HadSkip. HadSkip adaptively selects the layers to skip based on a predefined budget. Specifically, we introduce a learnable gate before each layer of the LM to determine whether the current layer should be skipped. To tackle various challenges in training brought by discrete gates and budget constraints, we propose a fine-grained initialization strategy and homotopic optimization strategy. We conduct extensive experiments on the GLUE benchmark, and experimental results demonstrate the proposed HadSkip outperforms all state-of-the-art baselines significantly.

Cite

CITATION STYLE

APA

Wang, H., Wang, Y., Liu, T., Zhao, T., & Gao, J. (2023). HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4283–4294). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.283

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free