CMU’s IWSLT 2023 Simultaneous Speech Translation System

Brian Yan; Jiatong Shi; Soumi Maiti; William Chen; Xinjian Li; Yifan Peng; Siddhant Arora; Shinji Watanabe

Conference ProceedingsOPEN ACCESS

CMU’s IWSLT 2023 Simultaneous Speech Translation System

20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference (2023) 235-240

DOI: 10.18653/v1/2023.iwslt-1.20

3Citations

14Readers

Abstract

This paper describes CMU’s submission to the IWSLT 2023 simultaneous speech translation shared task for translating English speech to both German text and speech in a streaming fashion. We first build offline speech-to-text (ST) models using the joint CTC/attention framework. These models also use WavLM front-end features and mBART decoder initialization. We adapt our offline ST models for simultaneous speech-to-text translation (SST) by 1) incrementally encoding chunks of input speech, re-computing encoder states for each new chunk and 2) incrementally decoding output text, pruning beam search hypotheses to 1-best after processing each chunk. We then build text-to-speech (TTS) models using the VITS framework and achieve simultaneous speech-to-speech translation (SS2ST) by cascading our SST and TTS models.

Cite

CITATION STYLE

APA

Yan, B., Shi, J., Maiti, S., Chen, W., Li, X., Peng, Y., … Watanabe, S. (2023). CMU’s IWSLT 2023 Simultaneous Speech Translation System. In 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference (pp. 235–240). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.iwslt-1.20

CMU’s IWSLT 2023 Simultaneous Speech Translation System

Abstract

Cite

Register to see more suggestions