Out of time: Automated lip sync in the wild

Joon Son Chung; Andrew Zisserman

Conference Proceedings

Out of time: Automated lip sync in the wild

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10117 LNCS 251-263

DOI: 10.1007/978-3-319-54427-4_19

185Citations

261Readers

Get full text

Abstract

The goal of this work is to determine the audio-video synchronisation between mouth motion and speech in a video. We propose a two-stream ConvNet architecture that enables the mapping between the sound and the mouth images to be trained end-to-end from unlabelled data. The trained network is used to determine the lipsync error in a video. We apply the network to two further tasks: active speaker detection and lip reading. On both tasks we set a new state-of-the-art on standard benchmark datasets.

Cite

CITATION STYLE

APA

Chung, J. S., & Zisserman, A. (2017). Out of time: Automated lip sync in the wild. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10117 LNCS, pp. 251–263). Springer Verlag. https://doi.org/10.1007/978-3-319-54427-4_19

Out of time: Automated lip sync in the wild

Abstract

Cite

Register to see more suggestions