Although text-guided image manipulation approaches have demonstrated highly accurate performance for editing the appearance of images in a virtual or simple scenario, their real-world applications face significant challenges. The primary cause of these challenges is the misalignment in the distribution of training and real-world data, which leads to unstable text-guided image manipulation. In this work, we propose a novel framework called TolerantGAN and tackle the new task of real-world text-guided image manipulation independent of the training data. To achieve this, we introduce two key concepts of a border smoothly connection module (BSCM) and a manipulation direction-based attention module (MDAM). BSCM smoothens the misalignment in the distribution of training and real-world data. MDAM extracts only regions highly relevant for image manipulation and assists in reconstructing unobserved objects in the training data. For in-the-wild input images of various classes, TolerantGAN robustly outperforms the state-of-the-art methods.
CITATION STYLE
Watanabe, Y., Togo, R., Maeda, K., Ogawa, T., & Haseyama, M. (2024). TolerantGAN: Text-Guided Image Manipulation Tolerant to Real-World Image. IEEE Open Journal of Signal Processing, 5, 150–159. https://doi.org/10.1109/OJSP.2023.3343335
Mendeley helps you to discover research relevant for your work.