DeFormer: Decomposing pre-trained transformers for faster question answering

41Citations
Citations of this article
234Readers
Mendeley users who have this article in their library.

Abstract

Transformer-based QA models use input-wide self-attention - i.e. across both the question and the input passage - at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1% drop in accuracy. We open source the code at https://github.com/StonyBrookNLP/deformer.

Cite

CITATION STYLE

APA

Cao, Q., Trivedi, H., Balasubramanian, A., & Balasubramanian, N. (2020). DeFormer: Decomposing pre-trained transformers for faster question answering. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 4487–4497). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.411

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free