Naturalness Improvement of Vietnamese Text-to-Speech System Using Diffusion Probabilistic Modelling and Unsupervised Data Enrichment

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Speech synthesis, which aims to generate natural and comprehensible speech from input text, is a popular research topic with a wide range of industrial applications. However, it appears to be a difficult problem due to its strong dependency on data, particularly for accent-sensitive and multi-dialect languages, e.g. Vietnamese. Perhaps the most common model applied in this area is Tacotron 2, using Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) architectures. Still, Tacotron 2 has not yet achieved the expected naturalness, possibly because it was not sophisticated enough to capture the natural expression of human voice. Moreover, with a low-resource language like Vietnamese, to collect a sufficient training dataset for this task is also a non-trivial problem. Hence, in this paper we propose an end-to-end framework with Grad-TTS, a denoising diffusion probabilistic model, as an acoustic model in the Text-to-speech (TTS) system instead of the traditional approach employed by Tacotron 2. The proposed approach helps us achieved a more natural synthesized speech, as depicted in the experiments. Furthermore, we also introduce an unsupervised approach to collect Vietnamese data from the Internet resource as well as to pre-process the data before putting it into training. This helps solve the problem of lacking Vietnamese data, and enhance our outcome. We released the dataset for further development of TTS system for Vietnamese at: https://bit.ly/3rnNsFi.

Cite

CITATION STYLE

APA

Tran, T., Nguyen, T., Bui, H., Nguyen, K., Vo, N. G., Pham, T. V., & Quan, T. (2022). Naturalness Improvement of Vietnamese Text-to-Speech System Using Diffusion Probabilistic Modelling and Unsupervised Data Enrichment. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 148, pp. 376–387). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-15063-0_36

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free