Fine-tune BERT with sparse self-attention mechanism

56Citations
Citations of this article
136Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we develop a novel Sparse Self-Attention Fine-tuning model (referred as SSAF) which integrates sparsity into self-attention mechanism to enhance the fine-tuning performance of BERT. In particular, sparsity is introduced into the self-attention by replacing softmax function with a controllable sparse transformation when fine-tuning with BERT. It enables us to learn a structurally sparse attention distribution, which leads to a more interpretable representation for the whole input. The proposed model is evaluated on sentiment analysis, question answering, and natural language inference tasks. The extensive experimental results across multiple datasets demonstrate its effectiveness and superiority to the baseline methods.

Cite

CITATION STYLE

APA

Cui, B., Li, Y., Chen, M., & Zhang, Z. (2019). Fine-tune BERT with sparse self-attention mechanism. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 3548–3553). Association for Computational Linguistics. https://doi.org/10.18653/v1/d19-1361

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free