Deep Equilibrium Non-Autoregressive Sequence Learning

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

In this work, we argue that non-autoregressive (NAR) sequence generative models can equivalently be regarded as iterative refinement process towards the target sequence, implying an underlying dynamical system of NAR models: z = f(z, x) → y. In such a way, the optimal prediction of a NAR model should be the equilibrium state of its dynamics if given infinitely many iterations. However, this is infeasible in practice due to limited computational and memory budgets. To this end, we propose DEQNAR to directly solve for the equilibrium state of NAR models based on deep equilibrium networks (Bai et al., 2019) with black-box root-finding solvers and back-propagate through the equilibrium point via implicit differentiation with constant memory. We conduct extensive experiments on four WMT machine translation benchmarks. Our main findings show that DEQNAR can indeed converge to a more accurate prediction and is a general-purpose framework that consistently helps yield substantial improvement for several strong NAR backbones.

Cite

CITATION STYLE

APA

Zheng, Z., Zhou, Y., & Zhou, H. (2023). Deep Equilibrium Non-Autoregressive Sequence Learning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 11763–11781). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.747

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free