Facial Expression Generation from Text with FaceCLIP

N/ACitations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Facial expression generation from pure textual descriptions is widely applied in human-computer interaction, computer-aided design, assisted education, etc. However, this task is challenging due to the intricate facial structure and the complex mapping between texts and images. Existing methods face limitations in generating high-resolution images or capturing diverse facial expressions. In this study, we propose a novel generation approach, named FaceCLIP, to tackle these problems. The proposed method utilizes a CLIP-based multi-stage generative adversarial model to produce vivid facial expressions with high resolutions. With strong semantic priors from multi-modal textual and visual cues, the proposed method effectively disentangles facial attributes, enabling attribute editing and semantic reasoning. To facilitate text-to-expression generation, we build a new dataset called the FET dataset, which contains facial expression images and corresponding textual descriptions. Experiments on the dataset demonstrate improved image quality and semantic consistency compared with state-of-the-art methods.

Cite

CITATION STYLE

APA

Fu, W. W., Gong, W. J., Yu, C. Y., Wang, W., & Gonzàlez, J. (2025). Facial Expression Generation from Text with FaceCLIP. Journal of Computer Science and Technology, 40(2), 359–377. https://doi.org/10.1007/s11390-024-3661-z

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free