The first FOSD-tacotron-2-based text-to-speech application for Vietnamese

3Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Recently, with the development and deployment of voicebots which help to minimize personnels at call centers, text-to-speech (TTS) systems supporting English and Chinese have attracted attentions of researchers and corporates worldwide. However, there is very limited published works in TTS developed for Vietnamese. Thus, this paper presents in detail the first Tacotron-2-based TTS application development for Vietnamese that utilizes the publicly available FPT open speech dataset (FOSD) containing approximately 30 hours of labeled audio files together with their transcripts. The dataset was made available by FPT Corporation with an open access license. A new cleaner was developed for supporting Vietnamese language rather than English which was provided by default in Mozilla TTS source code. After 225,000 training steps, the generated speeches have mean opinion score (MOS) well above the average value of 2.50 and center around 3.00 for both clearness and naturalness in a crowd-source survey.

Cite

CITATION STYLE

APA

Tran, D. C. (2021). The first FOSD-tacotron-2-based text-to-speech application for Vietnamese. Bulletin of Electrical Engineering and Informatics, 10(2), 898–903. https://doi.org/10.11591/eei.v10i2.2539

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free