TolerantGAN: Text-Guided Image Manipulation Tolerant to Real-World Image

Yuto Watanabe; Ren Togo; Keisuke Maeda; Takahiro Ogawa; Miki Haseyama

Journal ArticleOPEN ACCESS

TolerantGAN: Text-Guided Image Manipulation Tolerant to Real-World Image

IEEE Open Journal of Signal Processing (2024) 5 150-159

DOI: 10.1109/OJSP.2023.3343335

1Citations

16Readers

Abstract

Although text-guided image manipulation approaches have demonstrated highly accurate performance for editing the appearance of images in a virtual or simple scenario, their real-world applications face significant challenges. The primary cause of these challenges is the misalignment in the distribution of training and real-world data, which leads to unstable text-guided image manipulation. In this work, we propose a novel framework called TolerantGAN and tackle the new task of real-world text-guided image manipulation independent of the training data. To achieve this, we introduce two key concepts of a border smoothly connection module (BSCM) and a manipulation direction-based attention module (MDAM). BSCM smoothens the misalignment in the distribution of training and real-world data. MDAM extracts only regions highly relevant for image manipulation and assists in reconstructing unobserved objects in the training data. For in-the-wild input images of various classes, TolerantGAN robustly outperforms the state-of-the-art methods.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Watanabe, Y., Togo, R., Maeda, K., Ogawa, T., & Haseyama, M. (2024). TolerantGAN: Text-Guided Image Manipulation Tolerant to Real-World Image. IEEE Open Journal of Signal Processing, 5, 150–159. https://doi.org/10.1109/OJSP.2023.3343335

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 1

100%

Article Metrics

Mentions

News Mentions: 1

View details >

TolerantGAN: Text-Guided Image Manipulation Tolerant to Real-World Image

Abstract

Author supplied keywords

References Powered by Scopus

ImageNet: A Large-Scale Hierarchical Image Database

Image quality assessment: From error visibility to structural similarity

You only look once: Unified, real-time object detection

Cited by Powered by Scopus

Opportunities and Challenges of YOLO -World in Smart City Surveillance

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Article Metrics