Out of time: Automated lip sync in the wild

185Citations
Citations of this article
261Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The goal of this work is to determine the audio-video synchronisation between mouth motion and speech in a video. We propose a two-stream ConvNet architecture that enables the mapping between the sound and the mouth images to be trained end-to-end from unlabelled data. The trained network is used to determine the lipsync error in a video. We apply the network to two further tasks: active speaker detection and lip reading. On both tasks we set a new state-of-the-art on standard benchmark datasets.

Cite

CITATION STYLE

APA

Chung, J. S., & Zisserman, A. (2017). Out of time: Automated lip sync in the wild. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10117 LNCS, pp. 251–263). Springer Verlag. https://doi.org/10.1007/978-3-319-54427-4_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free