CMU’s IWSLT 2023 Simultaneous Speech Translation System

3Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

This paper describes CMU’s submission to the IWSLT 2023 simultaneous speech translation shared task for translating English speech to both German text and speech in a streaming fashion. We first build offline speech-to-text (ST) models using the joint CTC/attention framework. These models also use WavLM front-end features and mBART decoder initialization. We adapt our offline ST models for simultaneous speech-to-text translation (SST) by 1) incrementally encoding chunks of input speech, re-computing encoder states for each new chunk and 2) incrementally decoding output text, pruning beam search hypotheses to 1-best after processing each chunk. We then build text-to-speech (TTS) models using the VITS framework and achieve simultaneous speech-to-speech translation (SS2ST) by cascading our SST and TTS models.

Cite

CITATION STYLE

APA

Yan, B., Shi, J., Maiti, S., Chen, W., Li, X., Peng, Y., … Watanabe, S. (2023). CMU’s IWSLT 2023 Simultaneous Speech Translation System. In 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference (pp. 235–240). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.iwslt-1.20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free