That's What I Said: Fully-Controllable Talking Face Generation

Youngjoon Jang; Kyeongha Rho; Jongbin Woo; Hyeongkeun Lee; Jihwan Park; Youshin Lim; Byeong Yeol Kim; Joon Son Chung

Conference ProceedingsOPEN ACCESS

That's What I Said: Fully-Controllable Talking Face Generation

MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia (2023) 3827-3836

DOI: 10.1145/3581783.3612587

3Citations

14Readers

Get full text

Abstract

The goal of this paper is to synthesise talking faces with controllable facial motions. To achieve this goal, we propose two key ideas. The first is to establish a canonical space where every face has the same motion patterns but different identities. The second is to navigate a multimodal motion space that only represents motion-related features while eliminating identity information. To disentangle identity and motion, we introduce an orthogonality constraint between the two different latent spaces. From this, our method can generate natural-looking talking faces with fully controllable facial attributes and accurate lip synchronisation. Extensive experiments demonstrate that our method achieves state-of-the-art results in terms of both visual quality and lip-sync score. To the best of our knowledge, we are the first to develop a talking face generation framework that can accurately manifest full target facial motions including lip, head pose, and eye movements in the generated video without any additional supervision beyond RGB video with audio.

Author supplied keywords

Cite

CITATION STYLE

APA

Jang, Y., Rho, K., Woo, J., Lee, H., Park, J., Lim, Y., … Chung, J. S. (2023). That’s What I Said: Fully-Controllable Talking Face Generation. In MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia (pp. 3827–3836). Association for Computing Machinery, Inc. https://doi.org/10.1145/3581783.3612587

That's What I Said: Fully-Controllable Talking Face Generation

Abstract

Author supplied keywords

Cite

Register to see more suggestions