MnTTS2: An Open-Source Multi-speaker Mongolian Text-to-Speech Synthesis Dataset

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Text-to-Speech (TTS) synthesis for low-resource languages is an attractive research issue in academia and industry nowadays. Mongolian is the official language of the Inner Mongolia Autonomous Region and a representative low-resource language spoken by over 10 million people worldwide. However, there is a relative lack of open-source datasets for Mongolian TTS. Therefore, we make public an open-source multi-speaker Mongolian TTS dataset, named MnTTS2, for the benefit of related researchers. In this work, we prepare the transcription from various topics and invite three professional Mongolian announcers to form a three-speaker TTS dataset, in which each announcer records 10 h of speeches in Mongolian, resulting 30 h in total. Furthermore, we build the baseline system based on the state-of-the-art FastSpeech2 model and HiFi-GAN vocoder. The experimental results suggest that the constructed MnTTS2 dataset is sufficient to build robust multi-speaker TTS models for real-world applications. The MnTTS2 dataset, training recipe, and pretrained models are released at: https://github.com/ssmlkl/MnTTS2.

Cite

CITATION STYLE

APA

Liang, K., Liu, B., Hu, Y., Liu, R., Bao, F., & Gao, G. (2023). MnTTS2: An Open-Source Multi-speaker Mongolian Text-to-Speech Synthesis Dataset. In Communications in Computer and Information Science (Vol. 1765 CCIS, pp. 318–329). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-2401-1_28

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free