Capturing and Animation of Body and Clothing from Monocular Video

42Citations
Citations of this article
50Readers
Mendeley users who have this article in their library.

Abstract

While recent work has shown progress on extracting clothed 3D human avatars from a single image, video, or a set of 3D scans, several limitations remain. Most methods use a holistic representation to jointly model the body and clothing, which means that the clothing and body cannot be separated for applications like virtual try-on. Other methods separately model the body and clothing, but they require training from a large set of 3D clothed human meshes obtained from 3D/4D scanners or physics simulations. Our insight is that the body and clothing have different modeling requirements. While the body is well represented by a mesh-based parametric 3D model, implicit representations and neural radiance fields are better suited to capturing the large variety in shape and appearance present in clothing. Building on this insight, we propose SCARF (Segmented Clothed Avatar Radiance Field), a hybrid model combining a mesh-based body with a neural radiance field. Integrating the mesh into the volumetric rendering in combination with a differentiable rasterizer enables us to optimize SCARF directly from monocular videos, without any 3D supervision. The hybrid modeling enables SCARF to (i) animate the clothed body avatar by changing body poses (including hand articulation and facial expressions), (ii) synthesize novel views of the avatar, and (iii) transfer clothing between avatars in virtual try-on applications. We demonstrate that SCARF reconstructs clothing with higher visual quality than existing methods, that the clothing deforms with changing body pose and body shape, and that clothing can be successfully transferred between avatars of different subjects. The code and models are available at https://github.com/YadiraF/SCARF.

Author supplied keywords

References Powered by Scopus

2914Citations
1470Readers
Get full text

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

1751Citations
4217Readers
Get full text

End-to-End Recovery of Human Shape and Pose

1510Citations
1013Readers
Get full text

Cited by Powered by Scopus

ECON: Explicit Clothed humans Optimized via Normal integration

102Citations
65Readers
Get full text

Recovering 3D Human Mesh From Monocular Images: A Survey

51Citations
184Readers
Get full text

Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

25Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Feng, Y., Yang, J., Pollefeys, M., Black, M. J., & Bolkart, T. (2022). Capturing and Animation of Body and Clothing from Monocular Video. In Proceedings - SIGGRAPH Asia 2022 Conference Papers. Association for Computing Machinery, Inc. https://doi.org/10.1145/3550469.3555423

Readers over time

‘22‘23‘24‘2507142128

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 12

75%

Researcher 3

19%

Lecturer / Post doc 1

6%

Readers' Discipline

Tooltip

Computer Science 10

63%

Engineering 4

25%

Medicine and Dentistry 1

6%

Arts and Humanities 1

6%

Save time finding and organizing research with Mendeley

Sign up for free
0