Abstract
In this paper, we explore the generation of face images conditioned on a textual description, as well as the capabilities of the models in editing a machine-generated image on the basis of additional text prompts. We leverage open source state-of-the-art face image generators, StyleGAN models and couple these with the open source multimodal embedding space, CLIP, in an optimisation loop using the method in StyleCLIP to set up our experimental system. We make use of automatic metrics and human ratings to evaluate the results and, in addition, obtain insight into how much automatic metrics are correlated with human ratings. We found compelling evidence that both the text-to-image and editing models based on StyleGAN2 stand out as the better options. In addition, the automatic evaluation metrics are only weakly correlated with human ratings.
Author supplied keywords
Cite
CITATION STYLE
Fejjari, A., Abela, A., Tanti, M., & Muscat, A. (2025). Evaluation of StyleGAN-CLIP Models in Text-to-Image Generation of Faces. Applied Sciences (Switzerland), 15(15). https://doi.org/10.3390/app15158692
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.