Pre-trained language models (LMs) have brought remarkable performance to numerous NLP tasks. However, they require significant resources and entail high computational costs for inference, making it challenging to deploy them in real-world and real-time systems. Existing early exiting methods aim to reduce computational complexity by selecting the layer at which to exit, but suffer from the limitation that they have to sequentially traverse through all layers prior to the selected exit layer, which lacks flexibility and degrades their performance. To solve this problem, we propose a homotopic and adaptive layer skipping fine-tuning method named HadSkip. HadSkip adaptively selects the layers to skip based on a predefined budget. Specifically, we introduce a learnable gate before each layer of the LM to determine whether the current layer should be skipped. To tackle various challenges in training brought by discrete gates and budget constraints, we propose a fine-grained initialization strategy and homotopic optimization strategy. We conduct extensive experiments on the GLUE benchmark, and experimental results demonstrate the proposed HadSkip outperforms all state-of-the-art baselines significantly.
CITATION STYLE
Wang, H., Wang, Y., Liu, T., Zhao, T., & Gao, J. (2023). HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4283–4294). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.283
Mendeley helps you to discover research relevant for your work.