Lip movements generation at a glance

Lele Chen; Zhiheng Li; Ross K. Maddox; Zhiyao Duan; Chenliang Xu

Conference ProceedingsOPEN ACCESS

Lip movements generation at a glance

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11211 LNCS 538-553

DOI: 10.1007/978-3-030-01234-2_32

27Citations

174Readers

Abstract

In this paper, we consider the task: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech. To perform well, a model needs to not only consider the retention of target identity, photo-realistic of synthesized images, consistency and smoothness of lip images in a sequence, but more importantly, learn the correlations between audio speech and lip movements. To solve the collective problems, we devise a network to synthesize lip movements and propose a novel correlation loss to synchronize lip changes and speech changes. Our full model utilizes four losses for a comprehensive consideration; it is trained end-to-end and is robust to lip shapes, view angles and different facial characteristics. Thoughtful experiments on three datasets ranging from lab-recorded to lips in-the-wild show that our model significantly outperforms other state-of-the-art methods extended to this task.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, L., Li, Z., Maddox, R. K., Duan, Z., & Xu, C. (2018). Lip movements generation at a glance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11211 LNCS, pp. 538–553). Springer Verlag. https://doi.org/10.1007/978-3-030-01234-2_32

Lip movements generation at a glance

Abstract

Author supplied keywords

Cite

Register to see more suggestions