Efficient Contextual Representation Learning With Continuous Outputs

4Citations
Citations of this article
72Readers
Mendeley users who have this article in their library.

Abstract

Contextual representationmodels have achieved great success in improving various downstream natural language processing tasks. However, these language-model-based encoders are difficult to train due to their large parameter size and high computational complexity. By carefully examining the training procedure, we observe that the softmax layer, which predicts a distribution of the target word, often induces significant overhead, especially when the vocabulary size is large. Therefore, we revisit the design of the output layer and consider directly predicting the pre-trained embedding of the target word for a given context. When appliedtoELMo, theproposedapproach achieves a 4-fold speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks. Further analysis shows that the approach maintains the speed advantage under various settings, even when the sentence encoder is scaled up.

Cite

CITATION STYLE

APA

Li, L. H., Chen, P. H., Hsieh, C. J., & Chang, K. W. (2019). Efficient Contextual Representation Learning With Continuous Outputs. Transactions of the Association for Computational Linguistics, 7, 611–624. https://doi.org/10.1162/tacl_a_00289

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free