This paper presents a method for video generation under different viewpoints. The method gets inspired by MoCoGAN’s idea which modelled a video clip in two latent sub-spaces (content and motion) and achieved impressive results recently. However, MoCoGAN and most of existing methods of video generation did not take viewpoint into account so they cannot generate videos from a certain viewpoint, which is a need for data augmentation and advertisement applications. To this end, we propose to follow the idea of conditional GAN and introduce a new variable to control the generated video’s view. In addition, to keep the subject consistent during action implementation, we utilize an additional sub-network to generate the content control vector instead of using a random vector. Besides, the objective function for training the network will be modified to measure the similarity of content, action and view of the generated video with the truth one. Preliminary experiments are conducted for generating video clips of dynamic human hand gestures, showing the potential to generate videos under different viewpoints in the future.
CITATION STYLE
Tran, T. H., Bach, V. D., & Doan, H. G. (2020). vi-MoCoGAN: A Variant of MoCoGAN for Video Generation of Human Hand Gestures Under Different Viewpoints. In Communications in Computer and Information Science (Vol. 1180 CCIS, pp. 110–123). Springer. https://doi.org/10.1007/978-981-15-3651-9_11
Mendeley helps you to discover research relevant for your work.