World-Consistent Video-to-Video Synthesis

24Citations
Citations of this article
168Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Video-to-video synthesis (vid2vid) aims for converting high-level semantic inputs to photorealistic videos. While existing vid2vid methods can achieve short-term temporal consistency, they fail to ensure the long-term one. This is because they lack knowledge of the 3D world being rendered and generate each frame only based on the past few frames. To address the limitation, we introduce a novel vid2vid framework that efficiently and effectively utilizes all past generated frames during rendering. This is achieved by condensing the 3D world rendered so far into a physically-grounded estimate of the current frame, which we call the guidance image. We further propose a novel neural network architecture to take advantage of the information stored in the guidance images. Extensive experimental results on several challenging datasets verify the effectiveness of our approach in achieving world consistency—the output video is consistent within the entire rendered 3D world.

Author supplied keywords

Cite

CITATION STYLE

APA

Mallya, A., Wang, T. C., Sapra, K., & Liu, M. Y. (2020). World-Consistent Video-to-Video Synthesis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12353 LNCS, pp. 359–378). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58598-3_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free