Text to speech using Mel-Spectrogram with deep learning algorithms

Abdulamir A. Karim; Suha Mohammed Saleh

Journal ArticleOPEN ACCESS

Text to speech using Mel-Spectrogram with deep learning algorithms

Periodicals of Engineering and Natural Sciences (2022) 10(3) 380-386

DOI: 10.21533/PEN.V10I3.3113

1Citations

14Readers

Abstract

The purpose of text to speech (TTS), sometimes called speech synthesis, is to synthesize a natural and intelligible speech for a given text. A wide range of applications uses TTS technologies in media, chatbots, and entertainment, among other fields, making it a hot topic for the research community. Recently, the progress achieved by artificial intelligence, especially in deep learning and neural networks, enables TTS to produce a high-quality synthesized speech. However, despite the success achieved, currently, available works suffer from the need for very long training and inference time, which makes it dominated by big tech companies. This paper proposes a model based on convolutional neural networks (CNN) and gated recurrent units (GRU). The proposed model can work even in low computational environments and requires low training time. The MOS achieved is 4.26, higher than the MOS performed by state-of-the-art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Karim, A. A., & Saleh, S. M. (2022). Text to speech using Mel-Spectrogram with deep learning algorithms. Periodicals of Engineering and Natural Sciences, 10(3), 380–386. https://doi.org/10.21533/PEN.V10I3.3113

Text to speech using Mel-Spectrogram with deep learning algorithms

Abstract

Author supplied keywords

Cite

Register to see more suggestions