TolerantGAN: Text-Guided Image Manipulation Tolerant to Real-World Image

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Although text-guided image manipulation approaches have demonstrated highly accurate performance for editing the appearance of images in a virtual or simple scenario, their real-world applications face significant challenges. The primary cause of these challenges is the misalignment in the distribution of training and real-world data, which leads to unstable text-guided image manipulation. In this work, we propose a novel framework called TolerantGAN and tackle the new task of real-world text-guided image manipulation independent of the training data. To achieve this, we introduce two key concepts of a border smoothly connection module (BSCM) and a manipulation direction-based attention module (MDAM). BSCM smoothens the misalignment in the distribution of training and real-world data. MDAM extracts only regions highly relevant for image manipulation and assists in reconstructing unobserved objects in the training data. For in-the-wild input images of various classes, TolerantGAN robustly outperforms the state-of-the-art methods.

Cite

CITATION STYLE

APA

Watanabe, Y., Togo, R., Maeda, K., Ogawa, T., & Haseyama, M. (2024). TolerantGAN: Text-Guided Image Manipulation Tolerant to Real-World Image. IEEE Open Journal of Signal Processing, 5, 150–159. https://doi.org/10.1109/OJSP.2023.3343335

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free