Lip movements generation at a glance

27Citations
Citations of this article
174Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this paper, we consider the task: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech. To perform well, a model needs to not only consider the retention of target identity, photo-realistic of synthesized images, consistency and smoothness of lip images in a sequence, but more importantly, learn the correlations between audio speech and lip movements. To solve the collective problems, we devise a network to synthesize lip movements and propose a novel correlation loss to synchronize lip changes and speech changes. Our full model utilizes four losses for a comprehensive consideration; it is trained end-to-end and is robust to lip shapes, view angles and different facial characteristics. Thoughtful experiments on three datasets ranging from lab-recorded to lips in-the-wild show that our model significantly outperforms other state-of-the-art methods extended to this task.

Cite

CITATION STYLE

APA

Chen, L., Li, Z., Maddox, R. K., Duan, Z., & Xu, C. (2018). Lip movements generation at a glance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11211 LNCS, pp. 538–553). Springer Verlag. https://doi.org/10.1007/978-3-030-01234-2_32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free