Multimodal unit selection for 2D audiovisual text-to-speech synthesis

Wesley Mattheyses; Lukas Latacz; Werner Verhelst; Hichem Sahli

Conference Proceedings

Multimodal unit selection for 2D audiovisual text-to-speech synthesis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5237 LNCS 125-136

DOI: 10.1007/978-3-540-85853-9_12

N/ACitations

4Readers

Get full text

Abstract

Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Lately much interest goes out to data-driven 2D photorealistic synthesis, where the system uses a database of pre-recorded auditory and visual speech data to construct the target output signal. In this paper we propose a synthesis technique that creates both the target auditory and the target visual speech by using a same audiovisual database. To achieve this, the well-known unit selection synthesis technique is extended to work with multimodal segments containing original combinations of audio and video. This strategy results in a multimodal output signal that displays a high level of audiovisual correlation, which is crucial to achieve a natural perception of the synthetic speech signal. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Mattheyses, W., Latacz, L., Verhelst, W., & Sahli, H. (2008). Multimodal unit selection for 2D audiovisual text-to-speech synthesis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5237 LNCS, pp. 125–136). Springer Verlag. https://doi.org/10.1007/978-3-540-85853-9_12

Multimodal unit selection for 2D audiovisual text-to-speech synthesis

Abstract

Cite

Register to see more suggestions