HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference

Haoyu Wang; Yaqing Wang; Tianci Liu; Tuo Zhao; Jing Gao

Conference ProceedingsOPEN ACCESS

HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference

Findings of the Association for Computational Linguistics: EMNLP 2023 (2023) 4283-4294

DOI: 10.18653/v1/2023.findings-emnlp.283

1Citations

6Readers

Abstract

Pre-trained language models (LMs) have brought remarkable performance to numerous NLP tasks. However, they require significant resources and entail high computational costs for inference, making it challenging to deploy them in real-world and real-time systems. Existing early exiting methods aim to reduce computational complexity by selecting the layer at which to exit, but suffer from the limitation that they have to sequentially traverse through all layers prior to the selected exit layer, which lacks flexibility and degrades their performance. To solve this problem, we propose a homotopic and adaptive layer skipping fine-tuning method named HadSkip. HadSkip adaptively selects the layers to skip based on a predefined budget. Specifically, we introduce a learnable gate before each layer of the LM to determine whether the current layer should be skipped. To tackle various challenges in training brought by discrete gates and budget constraints, we propose a fine-grained initialization strategy and homotopic optimization strategy. We conduct extensive experiments on the GLUE benchmark, and experimental results demonstrate the proposed HadSkip outperforms all state-of-the-art baselines significantly.

Cite

CITATION STYLE

APA

Wang, H., Wang, Y., Liu, T., Zhao, T., & Gao, J. (2023). HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4283–4294). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.283

HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference

Abstract

Cite

Register to see more suggestions