The HW-TSC’s Speech-to-Speech Translation System for IWSLT 2023

1Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper describes our work on the IWSLT2023 Speech-to-Speech task. Our proposed cascaded system consists of an ensemble of Conformer and S2T-Transformer-based ASR models, a Transformer-based MT model, and a Diffusion-based TTS model. Our primary focus in this competition was to investigate the modeling ability of the Diffusion model for TTS tasks in high-resource scenarios and the role of TTS in the overall S2S task. To this end, we proposed DTS, an end-to-end diffusion-based TTS model that takes raw text as input and generates waveform by iteratively denoising on pure Gaussian noise. Compared to previous TTS models, the speech generated by DTS is more natural and performs better in code-switching scenarios. As the training process is end-to-end, it is relatively straightforward. Our experiments demonstrate that DTS outperforms other TTS models on the GigaS2S benchmark, and also brings positive gain for the entire S2S system.

Cite

CITATION STYLE

APA

Minghan, W., Yinglu, L., Jiaxin, G., Zongyao, L., Hengchao, S., Daimeng, W., … Hao, Y. (2023). The HW-TSC’s Speech-to-Speech Translation System for IWSLT 2023. In 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference (pp. 277–282). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.iwslt-1.33

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free