Audio-Driven Talking Face Generation: A Review

Shiguang Liu

Journal ArticleOPEN ACCESS

Audio-Driven Talking Face Generation: A Review

Liu S

AES: Journal of the Audio Engineering Society (2023) 71(7-8) 408-419

DOI: 10.17743/jaes.2022.0081

2Citations

5Readers

Abstract

Given a face image and a speech audio, talking face generation refers to synthesizing a face video speaking the given speech. It has wide applications in movie dubbing, teleconference, virtual assistant, etc. This paper gives an overview of research progress on talking face generation in recent years. The author first reviews traditional talking face generation methods. Then, deep learning talking face generation methods, including talking face synthesis for a specific identity and talking face synthesis for an arbitrary identity, are summarized. The author then surveys recent detail-aware talking face generation methods, including noise based approaches, eye conversion based approaches, and facial anatomy based approaches. Next, the author surveys the talking head generation methods, such as video/image driven talking head generation, pose information–driven talking head generation, and audio-driven talking head generation. Finally, some future directions for talking face generation are highlighted.

Cite

CITATION STYLE

APA

Liu, S. (2023). Audio-Driven Talking Face Generation: A Review. AES: Journal of the Audio Engineering Society, 71(7–8), 408–419. https://doi.org/10.17743/jaes.2022.0081

Audio-Driven Talking Face Generation: A Review

Abstract

Cite

Register to see more suggestions