Enhancing local dependencies for transformer-based text-to-speech via hybrid lightweight convolution

Wei Zhao; Ting He; Li Xu

Journal ArticleOPEN ACCESS

Enhancing local dependencies for transformer-based text-to-speech via hybrid lightweight convolution

IEEE Access (2021) 9 42762-42770

DOI: 10.1109/ACCESS.2021.3065736

3Citations

16Readers

Abstract

Owing to the powerful self-attention mechanism, the Transformer network has achieved considerable successes across many sequence modeling tasks and has become one of the most popular methods in text-to-speech (TTS). The vanilla self-attention excels in capturing long-range dependencies but suffers in modeling stable short-range dependencies that are quite important for speech synthesis where the local audio signals are highly correlated. To address this problem, we propose the hybrid lightweight convolution (HLC), which is responsible for fully exploiting local structures of a sequence, and combine it with the self-attention to improve the Transformer-based TTS. The experimental results show that our modified model obtains better performance in both objective and subjective evaluations. At the same time, we also demonstrate that a more compact TTS model may be built through the combination of self-attention and proposed hybrid lightweight convolution. Besides, this method is also potentially adaptable for other sequence modeling tasks.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhao, W., He, T., & Xu, L. (2021). Enhancing local dependencies for transformer-based text-to-speech via hybrid lightweight convolution. IEEE Access, 9, 42762–42770. https://doi.org/10.1109/ACCESS.2021.3065736

Enhancing local dependencies for transformer-based text-to-speech via hybrid lightweight convolution

Abstract

Author supplied keywords

Cite

Register to see more suggestions