World-Consistent Video-to-Video Synthesis

Arun Mallya; Ting Chun Wang; Karan Sapra; Ming Yu Liu

Conference Proceedings

World-Consistent Video-to-Video Synthesis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12353 LNCS 359-378

DOI: 10.1007/978-3-030-58598-3_22

24Citations

168Readers

Get full text

Abstract

Video-to-video synthesis (vid2vid) aims for converting high-level semantic inputs to photorealistic videos. While existing vid2vid methods can achieve short-term temporal consistency, they fail to ensure the long-term one. This is because they lack knowledge of the 3D world being rendered and generate each frame only based on the past few frames. To address the limitation, we introduce a novel vid2vid framework that efficiently and effectively utilizes all past generated frames during rendering. This is achieved by condensing the 3D world rendered so far into a physically-grounded estimate of the current frame, which we call the guidance image. We further propose a novel neural network architecture to take advantage of the information stored in the guidance images. Extensive experimental results on several challenging datasets verify the effectiveness of our approach in achieving world consistency—the output video is consistent within the entire rendered 3D world.

Author supplied keywords

Cite

CITATION STYLE

APA

Mallya, A., Wang, T. C., Sapra, K., & Liu, M. Y. (2020). World-Consistent Video-to-Video Synthesis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12353 LNCS, pp. 359–378). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58598-3_22

World-Consistent Video-to-Video Synthesis

Abstract

Author supplied keywords

Cite

Register to see more suggestions