Faster Depth-Adaptive Transformers

27Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.

Abstract

Depth-adaptive neural networks can dynamically adjust depths according to the hardness of input words, and thus improve efficiency. The main challenge is how to measure such hardness and decide the required depths (i.e., layers) to conduct. Previous works generally build a halting unit to decide whether the computation should continue or stop at each layer. As there is no specific supervision of depth selection, the halting unit may be under-optimized and inaccurate, which results in suboptimal and unstable performance when modeling sentences. In this paper, we get rid of the halting unit and estimate the required depths in advance, which yields a faster depth-adaptive model. Specifically, two approaches are proposed to explicitly measure the hardness of input words and estimate corresponding adaptive depth, namely 1) mutual information (MI) based estimation and 2) reconstruction loss based estimation. We conduct experiments on the text classification task with 24 datasets in various sizes and domains. Results confirm that our approaches can speed up the vanilla Transformer (up to 7x) while preserving high accuracy. Moreover, efficiency and robustness are significantly improved when compared with other depthadaptive approaches.

Cite

CITATION STYLE

APA

Liu, Y., Meng, F., Zhou, J., Chen, Y., & Xu, J. (2021). Faster Depth-Adaptive Transformers. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 15, pp. 13424–13432). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i15.17584

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free