Pre-trained speech Transformers in speech translation (ST) have facilitated state-of-the-art (SotA) results; yet, using such encoders is computationally expensive. To improve this, we present a novel Reducer Adaptor block, RedApt, that could be seamlessly integrated within any Transformer-based speech encoding architecture. Integrating the pretrained WAV2VEC 2 speech encoder with RedApt brings 41% speedup, 33% memory reduction with 24% fewer FLOPs at inference. To our positive surprise, our ST model with RedApt outperforms the SotA architecture by an average of 0.68 BLEU score on 8 language pairs from Must-C.
CITATION STYLE
Zhao, J., Yang, H., Haffari, G., & Shareghi, E. (2022). RedApt: An Adaptor for WAV2VEC 2 Encoding Faster and Smaller Speech Translation without Quality Compromise. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 1960–1967). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.142
Mendeley helps you to discover research relevant for your work.