Self-supervised Dance Video Synthesis Conditioned on Music

44Citations
Citations of this article
54Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a self-supervised approach with pose perceptual loss for automatic dance video generation. Our method can produce a realistic dance video that conforms to the beats and rhymes of given music. To achieve this, we firstly generate a human skeleton sequence from music and then apply the learned pose-To-Appearance mapping to generate the final video. In the stage of generating skeleton sequences, we utilize two discriminators to capture different aspects of the sequence and propose a novel pose perceptual loss to produce natural dances. Besides, we also provide a new cross-modal evaluation metric to evaluate the dance quality, which is able to estimate the similarity between two modalities (music and dance). Finally, our experimental qualitative and quantitative results demonstrate that our dance video synthesis approach produces realistic and diverse results. Our source code and data are available at https://github.com/xrenaa/Music-Dance-Video-Synthesis.

Cite

CITATION STYLE

APA

Ren, X., Li, H., Huang, Z., & Chen, Q. (2020). Self-supervised Dance Video Synthesis Conditioned on Music. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 46–54). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3413932

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free