Talking Faces: Audio-to-Video Face Generation

6Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Talking face generation aims at synthesizing coherent and realistic face sequences given an input speech. The task enjoys a wide spectrum of downstream applications, such as teleconferencing, movie dubbing, and virtual assistant. The emergence of deep learning and cross-modality research has led to many interesting works that address talking face generation. Despite great research efforts in talking face generation, the problem remains challenging due to the need for fine-grained control of face components and the generalization to arbitrary sentences. In this chapter, we first discuss the definition and underlying challenges of the problem. Then, we present an overview of recent progress in talking face generation. In addition, we introduce some widely used datasets and performance metrics. Finally, we discuss open questions, potential future directions, and ethical considerations in this task.

Cite

CITATION STYLE

APA

Wang, Y., Song, L., Wu, W., Qian, C., He, R., & Loy, C. C. (2022). Talking Faces: Audio-to-Video Face Generation. In Advances in Computer Vision and Pattern Recognition (pp. 163–188). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-87664-7_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free