Evaluation of StyleGAN-CLIP Models in Text-to-Image Generation of Faces

Asma Fejjari; Aaron Abela; Marc Tanti; Adrian Muscat

Journal ArticleOPEN ACCESS

Evaluation of StyleGAN-CLIP Models in Text-to-Image Generation of Faces

Applied Sciences (Switzerland) (2025) 15(15)

DOI: 10.3390/app15158692

4Citations

17Readers

Abstract

In this paper, we explore the generation of face images conditioned on a textual description, as well as the capabilities of the models in editing a machine-generated image on the basis of additional text prompts. We leverage open source state-of-the-art face image generators, StyleGAN models and couple these with the open source multimodal embedding space, CLIP, in an optimisation loop using the method in StyleCLIP to set up our experimental system. We make use of automatic metrics and human ratings to evaluate the results and, in addition, obtain insight into how much automatic metrics are correlated with human ratings. We found compelling evidence that both the text-to-image and editing models based on StyleGAN2 stand out as the better options. In addition, the automatic evaluation metrics are only weakly correlated with human ratings.

Author supplied keywords

Cite

CITATION STYLE

APA

Fejjari, A., Abela, A., Tanti, M., & Muscat, A. (2025). Evaluation of StyleGAN-CLIP Models in Text-to-Image Generation of Faces. Applied Sciences (Switzerland), 15(15). https://doi.org/10.3390/app15158692

Evaluation of StyleGAN-CLIP Models in Text-to-Image Generation of Faces

Abstract

Author supplied keywords

Cite

Register to see more suggestions