Leveraging Relaxed Equilibrium by Lazy Transition for Sequence Modeling

3Citations
Citations of this article
34Readers
Mendeley users who have this article in their library.

Abstract

In sequence modeling, certain tokens are usually less ambiguous than others, and representations of these tokens require fewer refinements for disambiguation. However, given the nature of attention-based models like Transformer and UT (universal transformer), all tokens are equally processed towards depth. Inspired by the equilibrium phenomenon, we present a lazy transition, a mechanism to adjust the significance of iterative refinements for each token representation. Our lazy transition is deployed on top of UT to build LT (lazy transformer), where all tokens are processed unequally towards depth. Eventually, LT is encouraged to oscillate around a relaxed equilibrium. Our experiments show that LT outperforms baseline models on several tasks of machine translation, pre-training, Learning to Execute, and LAMBADA.

Cite

CITATION STYLE

APA

Ai, X., & Fang, B. (2022). Leveraging Relaxed Equilibrium by Lazy Transition for Sequence Modeling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 2904–2924). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.208

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free