Building Mongolian TTS Front-End with Encoder-Decoder Model by Using Bridge Method and Multi-view Features

Rui Liu; Feilong Bao; Guanglai Gao

Conference Proceedings

Building Mongolian TTS Front-End with Encoder-Decoder Model by Using Bridge Method and Multi-view Features

Communications in Computer and Information Science (2019) 1143 CCIS 642-651

DOI: 10.1007/978-3-030-36802-9_68

5Citations

4Readers

Get full text

Abstract

In the context of text-to-speech systems (TTS), a front-end is a critical step for extracting linguistic features from given input text. In this paper, we propose a Mongolian TTS front-end which joint training Grapheme-to-Phoneme conversion (G2P) and phrase break prediction (PB). We use a bidirectional long short-term memory (LSTM) network as the encoder side, and build two decoders for G2P and PB that share the same encoder. Meanwhile, we put the source input features and encoder hidden states together into the Decoder, aim to shorten the distance between the source and target sequence and learn the alignment information better. More importantly, to obtain a robust representation for Mongolian words, which are agglutinative in nature and lacks sufficient training corpus, we design specific multi-view input features for it. Our subjective and objective experiments have demonstrated the effectiveness of this proposal.

Author supplied keywords

Cite

CITATION STYLE

APA

Liu, R., Bao, F., & Gao, G. (2019). Building Mongolian TTS Front-End with Encoder-Decoder Model by Using Bridge Method and Multi-view Features. In Communications in Computer and Information Science (Vol. 1143 CCIS, pp. 642–651). Springer. https://doi.org/10.1007/978-3-030-36802-9_68

Building Mongolian TTS Front-End with Encoder-Decoder Model by Using Bridge Method and Multi-view Features

Abstract

Author supplied keywords

Cite

Register to see more suggestions