Talking face generation aims at synthesizing coherent and realistic face sequences given an input speech. The task enjoys a wide spectrum of downstream applications, such as teleconferencing, movie dubbing, and virtual assistant. The emergence of deep learning and cross-modality research has led to many interesting works that address talking face generation. Despite great research efforts in talking face generation, the problem remains challenging due to the need for fine-grained control of face components and the generalization to arbitrary sentences. In this chapter, we first discuss the definition and underlying challenges of the problem. Then, we present an overview of recent progress in talking face generation. In addition, we introduce some widely used datasets and performance metrics. Finally, we discuss open questions, potential future directions, and ethical considerations in this task.
CITATION STYLE
Wang, Y., Song, L., Wu, W., Qian, C., He, R., & Loy, C. C. (2022). Talking Faces: Audio-to-Video Face Generation. In Advances in Computer Vision and Pattern Recognition (pp. 163–188). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-87664-7_8
Mendeley helps you to discover research relevant for your work.