Talking Faces: Audio-to-Video Face Generation

Yuxin Wang; Linsen Song; Wayne Wu; Chen Qian; Ran He; Chen Change Loy

Book ChapterOPEN ACCESS

Talking Faces: Audio-to-Video Face Generation

Springer Science and Business Media Deutschland GmbH, (2022), 163-188

DOI: 10.1007/978-3-030-87664-7_8

6Citations

11Readers

Abstract

Talking face generation aims at synthesizing coherent and realistic face sequences given an input speech. The task enjoys a wide spectrum of downstream applications, such as teleconferencing, movie dubbing, and virtual assistant. The emergence of deep learning and cross-modality research has led to many interesting works that address talking face generation. Despite great research efforts in talking face generation, the problem remains challenging due to the need for fine-grained control of face components and the generalization to arbitrary sentences. In this chapter, we first discuss the definition and underlying challenges of the problem. Then, we present an overview of recent progress in talking face generation. In addition, we introduce some widely used datasets and performance metrics. Finally, we discuss open questions, potential future directions, and ethical considerations in this task.

Cite

CITATION STYLE

APA

Wang, Y., Song, L., Wu, W., Qian, C., He, R., & Loy, C. C. (2022). Talking Faces: Audio-to-Video Face Generation. In Advances in Computer Vision and Pattern Recognition (pp. 163–188). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-87664-7_8

Talking Faces: Audio-to-Video Face Generation

Abstract

Cite

Register to see more suggestions