Evaluation of StyleGAN-CLIP Models in Text-to-Image Generation of Faces

4Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we explore the generation of face images conditioned on a textual description, as well as the capabilities of the models in editing a machine-generated image on the basis of additional text prompts. We leverage open source state-of-the-art face image generators, StyleGAN models and couple these with the open source multimodal embedding space, CLIP, in an optimisation loop using the method in StyleCLIP to set up our experimental system. We make use of automatic metrics and human ratings to evaluate the results and, in addition, obtain insight into how much automatic metrics are correlated with human ratings. We found compelling evidence that both the text-to-image and editing models based on StyleGAN2 stand out as the better options. In addition, the automatic evaluation metrics are only weakly correlated with human ratings.

Cite

CITATION STYLE

APA

Fejjari, A., Abela, A., Tanti, M., & Muscat, A. (2025). Evaluation of StyleGAN-CLIP Models in Text-to-Image Generation of Faces. Applied Sciences (Switzerland), 15(15). https://doi.org/10.3390/app15158692

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free