ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech

Saeed Ghorbani; Ylva Ferstl; Daniel Holden; Nikolaus F. Troje; Marc André Carbonneau

Journal ArticleOPEN ACCESS

ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech

Computer Graphics Forum (2023) 42(1) 206-216

DOI: 10.1111/cgf.14734

85Citations

46Readers

Abstract

We present ZeroEGGS, a neural network framework for speech-driven gesture generation with zero-shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scaling of style embeddings. The probabilistic nature of our framework further enables the generation of a variety of outputs given the input, addressing the stochastic nature of gesture motion. In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles. In a user study, we then show that our model outperforms previous state-of-the-art techniques in naturalness of motion, appropriateness for speech, and style portrayal. Finally, we release a high-quality dataset of full-body gesture motion including fingers, with speech, spanning across 19 different styles. Our code and data are publicly available at https://github.com/ubisoft/ubisoft-laforge-ZeroEGGS.

Author supplied keywords

Cite

CITATION STYLE

APA

Ghorbani, S., Ferstl, Y., Holden, D., Troje, N. F., & Carbonneau, M. A. (2023). ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech. Computer Graphics Forum, 42(1), 206–216. https://doi.org/10.1111/cgf.14734

ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech

Abstract

Author supplied keywords

Cite

Register to see more suggestions