CLAD-ST: Contrastive Learning with Adversarial Data for Robust Speech Translation

Sathish Reddy Indurthi; Shamil Chollampatt; Ravi Agrawal; Marco Turchi

Conference Proceedings

CLAD-ST: Contrastive Learning with Adversarial Data for Robust Speech Translation

EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (2023) 9049-9056

DOI: 10.18653/v1/2023.emnlp-main.560

4Citations

13Readers

Get full text

Abstract

The cascaded approach continues to be the most popular choice for speech translation (ST). This approach consists of an automatic speech recognition (ASR) model and a machine translation (MT) model that are used in a pipeline to translate speech in one language to text in another language. MT models are often trained on well-formed text and therefore lack robustness while translating noisy ASR outputs in the cascaded approach, degrading the overall translation quality significantly. We address this robustness problem in downstream MT models by forcing the MT encoder to bring the representations of a noisy input closer to its clean version in the semantic space. This is achieved by introducing a contrastive learning method that leverages adversarial examples in the form of ASR outputs paired with their corresponding human transcripts to optimize the network parameters. In addition, a curriculum learning strategy is then used to stabilize the training by alternating the standard MT log-likelihood loss and the contrastive losses. Our approach achieves significant gains of up to 3 BLEU scores in English-German and English-French speech translation without hurting the translation quality on clean text.

Cite

CITATION STYLE

APA

Indurthi, S. R., Chollampatt, S., Agrawal, R., & Turchi, M. (2023). CLAD-ST: Contrastive Learning with Adversarial Data for Robust Speech Translation. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 9049–9056). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.560

CLAD-ST: Contrastive Learning with Adversarial Data for Robust Speech Translation

Abstract

Cite

Register to see more suggestions