Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

2Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient parallel data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in pseudo-label quality degradation. We investigate two categories of remedies that require no additional supervision and target the domain mismatch: pseudo-label filtering and data augmentation. We show that pseudo-label analysis and processing in this way results in additional gains on top of the vanilla pseudo-labeling setup providing a total improvement of up to 0.4% absolute WER and 2.1 BLEU points for En-De and 0.6% absolute WER and 2.2 BLEU points for En-Zh.

Cite

CITATION STYLE

APA

Gheini, M., Likhomanenko, T., Sperber, M., & Setiawan, H. (2023). Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 7637–7650). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.483

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free