Enhancing Scalability of Pre-trained Language Models via Efficient Parameter Sharing

Peiyu Liu; Ze Feng Gao; Yushuo Chen; Wayne Xin Zhao; Ji Rong Wen

Conference Proceedings

Enhancing Scalability of Pre-trained Language Models via Efficient Parameter Sharing

Findings of the Association for Computational Linguistics: EMNLP 2023 (2023) 13771-13785

DOI: 10.18653/v1/2023.findings-emnlp.920

7Citations

12Readers

Get full text

Abstract

In this paper, we propose a highly parameter-efficient approach to scaling pre-trained language models (PLMs) to a deeper model depth. Unlike prior work that shares all parameters or uses extra blocks, we design a more capable parameter-sharing architecture based on matrix product operator (MPO), an efficient tensor decomposition method to factorize the parameter matrix into a set of local tensors. Based on such a decomposition, we share the important local tensor across all layers for reducing the model size and meanwhile keep layer-specific tensors (also using Adapters) for enhancing the adaptation flexibility. To improve the model training, we further propose a stable initialization algorithm tailored for the MPO-based architecture. Extensive experiments have demonstrated the effectiveness of our proposed model in enhancing scalability and achieving higher performance (i.e., with fewer parameters than BERTBASE, we successfully scale the model depth by a factor of 4× and even achieve 0.1 points higher than BERTLARGE for GLUE score). The code to reproduce the results of this paper can be found at https://github.com/RUCAIBox/MPOBERT-code.

Cite

CITATION STYLE

APA

Liu, P., Gao, Z. F., Chen, Y., Zhao, W. X., & Wen, J. R. (2023). Enhancing Scalability of Pre-trained Language Models via Efficient Parameter Sharing. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 13771–13785). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.920

Enhancing Scalability of Pre-trained Language Models via Efficient Parameter Sharing

Abstract

Cite

Register to see more suggestions