CLAD-ST: Contrastive Learning with Adversarial Data for Robust Speech Translation

4Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The cascaded approach continues to be the most popular choice for speech translation (ST). This approach consists of an automatic speech recognition (ASR) model and a machine translation (MT) model that are used in a pipeline to translate speech in one language to text in another language. MT models are often trained on well-formed text and therefore lack robustness while translating noisy ASR outputs in the cascaded approach, degrading the overall translation quality significantly. We address this robustness problem in downstream MT models by forcing the MT encoder to bring the representations of a noisy input closer to its clean version in the semantic space. This is achieved by introducing a contrastive learning method that leverages adversarial examples in the form of ASR outputs paired with their corresponding human transcripts to optimize the network parameters. In addition, a curriculum learning strategy is then used to stabilize the training by alternating the standard MT log-likelihood loss and the contrastive losses. Our approach achieves significant gains of up to 3 BLEU scores in English-German and English-French speech translation without hurting the translation quality on clean text.

Cite

CITATION STYLE

APA

Indurthi, S. R., Chollampatt, S., Agrawal, R., & Turchi, M. (2023). CLAD-ST: Contrastive Learning with Adversarial Data for Robust Speech Translation. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 9049–9056). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.560

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free