Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data

7Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

We present a method for introducing a text encoder into pretrained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the speech translation model can learn from both unlabeled and labeled data, especially when the source-language text data is abundant. Beyond this, we present a denoising method to build a robust text encoder that can deal with both normal and noisy text data. Our system sets new state-of-the-arts on the MuST-C En-De, En-Fr, and LibriSpeech En-Fr tasks.

Cite

CITATION STYLE

APA

Zhang, Y., Xu, C., Hu, B., Zhang, C., Xiao, T., & Zhu, J. (2023). Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (Vol. 37, pp. 13984–13992). AAAI Press. https://doi.org/10.1609/aaai.v37i11.26637

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free