Human motion generation from caption is a fast-growing and promising technique. Recent methods employ the latest hidden states of a recurrent neural network (RNN) to encode the skeletons, which can only address Coarse-grained motions generation. In this work, we propose a novel human motion generation framework which can simultaneously consider the temporal coherence of each individual action. Our model consists of two components: Semantic Extractor, Motion Generator. The Semantic Extractor can map caption into semantical guidance for fine motion generation. The Motion Generator can model the long-term tendency of each individual action. In addition, the Motion Generator can capture global location and local dynamics of each individual action such that more fine-grained activity generation can be guaranteed. Extensive experiments show that our method achieves a superior performance gain over previous methods on two benchmark datasets.
CITATION STYLE
Wang, Z., Yao, T., Wei, H., Guan, S., & Ni, B. (2018). Multi-person/group interactive video generation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11165 LNCS, pp. 307–317). Springer Verlag. https://doi.org/10.1007/978-3-030-00767-6_29
Mendeley helps you to discover research relevant for your work.