Facial Expression Generation from Text with FaceCLIP

Wen Wen Fu; Wen Juan Gong; Chen Yang Yu; Wei Wang; Jordi Gonzàlez

Journal Article

Facial Expression Generation from Text with FaceCLIP

Journal of Computer Science and Technology (2025) 40(2) 359-377

DOI: 10.1007/s11390-024-3661-z

N/ACitations

3Readers

Get full text

Abstract

Facial expression generation from pure textual descriptions is widely applied in human-computer interaction, computer-aided design, assisted education, etc. However, this task is challenging due to the intricate facial structure and the complex mapping between texts and images. Existing methods face limitations in generating high-resolution images or capturing diverse facial expressions. In this study, we propose a novel generation approach, named FaceCLIP, to tackle these problems. The proposed method utilizes a CLIP-based multi-stage generative adversarial model to produce vivid facial expressions with high resolutions. With strong semantic priors from multi-modal textual and visual cues, the proposed method effectively disentangles facial attributes, enabling attribute editing and semantic reasoning. To facilitate text-to-expression generation, we build a new dataset called the FET dataset, which contains facial expression images and corresponding textual descriptions. Experiments on the dataset demonstrate improved image quality and semantic consistency compared with state-of-the-art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Fu, W. W., Gong, W. J., Yu, C. Y., Wang, W., & Gonzàlez, J. (2025). Facial Expression Generation from Text with FaceCLIP. Journal of Computer Science and Technology, 40(2), 359–377. https://doi.org/10.1007/s11390-024-3661-z

Facial Expression Generation from Text with FaceCLIP

Abstract

Author supplied keywords

Cite

Register to see more suggestions