The HW-TSC’s Speech-to-Speech Translation System for IWSLT 2023

Wang Minghan; Li Yinglu; Guo Jiaxin; Li Zongyao; Shang Hengchao; Wei Daimeng; Su Chang; Zhang Min; Tao Shimin; Yang Hao

Conference Proceedings

The HW-TSC’s Speech-to-Speech Translation System for IWSLT 2023

20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference (2023) 277-282

DOI: 10.18653/v1/2023.iwslt-1.33

1Citations

39Readers

Get full text

Abstract

This paper describes our work on the IWSLT2023 Speech-to-Speech task. Our proposed cascaded system consists of an ensemble of Conformer and S2T-Transformer-based ASR models, a Transformer-based MT model, and a Diffusion-based TTS model. Our primary focus in this competition was to investigate the modeling ability of the Diffusion model for TTS tasks in high-resource scenarios and the role of TTS in the overall S2S task. To this end, we proposed DTS, an end-to-end diffusion-based TTS model that takes raw text as input and generates waveform by iteratively denoising on pure Gaussian noise. Compared to previous TTS models, the speech generated by DTS is more natural and performs better in code-switching scenarios. As the training process is end-to-end, it is relatively straightforward. Our experiments demonstrate that DTS outperforms other TTS models on the GigaS2S benchmark, and also brings positive gain for the entire S2S system.

Cite

CITATION STYLE

APA

Minghan, W., Yinglu, L., Jiaxin, G., Zongyao, L., Hengchao, S., Daimeng, W., … Hao, Y. (2023). The HW-TSC’s Speech-to-Speech Translation System for IWSLT 2023. In 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference (pp. 277–282). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.iwslt-1.33

The HW-TSC’s Speech-to-Speech Translation System for IWSLT 2023

Abstract

Cite

Register to see more suggestions