GhostBERT: Generate more features with cheap operations for BERT

Zhiqi Huang; Lu Hou; Lifeng Shang; Xin Jiang; Xiao Chen; Qun Liu

Conference ProceedingsOPEN ACCESS

GhostBERT: Generate more features with cheap operations for BERT

ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2021) 6512-6523

DOI: 10.18653/v1/2021.acl-long.509

19Citations

73Readers

Abstract

Transformer-based pre-trained language models like BERT, though powerful in many tasks, are expensive in both memory and computation, due to their large number of parameters. Previous works show that some parameters in these models can be pruned away without severe accuracy drop. However, these redundant features contribute to a comprehensive understanding of the training data and removing them weakens the model's representation ability. In this paper, we propose GhostBERT, which generates more features with very cheap operations from the remaining features. In this way, GhostBERT has similar memory and computational cost as the pruned model, but enjoys much larger representation power. The proposed ghost module can also be applied to unpruned BERT models to enhance their performance with negligible additional parameters and computation. Empirical results on the GLUE benchmark on three backbone models (i.e., BERT, RoBERTa and ELECTRA) verify the efficacy of our proposed method.

Cite

CITATION STYLE

APA

Huang, Z., Hou, L., Shang, L., Jiang, X., Chen, X., & Liu, Q. (2021). GhostBERT: Generate more features with cheap operations for BERT. In ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 6512–6523). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.acl-long.509

GhostBERT: Generate more features with cheap operations for BERT

Abstract

Cite

Register to see more suggestions