RWEN-TTS: Relation-Aware Word Encoding Network for Natural Text-to-Speech Synthesis

Shinhyeok Oh; Hyeong Rae Noh; Yoonseok Hong; Insoo Oh

Conference ProceedingsOPEN ACCESS

RWEN-TTS: Relation-Aware Word Encoding Network for Natural Text-to-Speech Synthesis

Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (2023) 37 13428-13436

DOI: 10.1609/aaai.v37i11.26575

0Citations

9Readers

Abstract

With the advent of deep learning, a huge number of text-to-speech (TTS) models which produce human-like speech have emerged. Recently, by introducing syntactic and semantic information w.r.t the input text, various approaches have been proposed to enrich the naturalness and expressiveness of TTS models. Although these strategies showed impressive results, they still have some limitations in utilizing language information. First, most approaches only use graph networks to utilize syntactic and semantic information without considering linguistic features. Second, most previous works do not explicitly consider adjacent words when encoding syntactic and semantic information, even though it is obvious that adjacent words are usually meaningful when encoding the current word. To address these issues, we propose Relation-aware Word Encoding Network (RWEN), which effectively allows syntactic and semantic information based on two modules (i.e., Semantic-level Relation Encoding and Adjacent Word Relation Encoding). Experimental results show substantial improvements compared to previous works.

Cite

CITATION STYLE

APA

Oh, S., Noh, H. R., Hong, Y., & Oh, I. (2023). RWEN-TTS: Relation-Aware Word Encoding Network for Natural Text-to-Speech Synthesis. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (Vol. 37, pp. 13428–13436). AAAI Press. https://doi.org/10.1609/aaai.v37i11.26575

RWEN-TTS: Relation-Aware Word Encoding Network for Natural Text-to-Speech Synthesis

Abstract

Cite

Register to see more suggestions