Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We introduce the problem of referring object manipulation (ROM), which aims to generate photo-realistic image edits regarding two textual descriptions: 1) a text referring to an object in the input image and 2) a text describing how to manipulate the referred object. A successful ROM model would enable users to simply use natural language to manipulate images, removing the need for learning sophisticated image editing software. We present one of the first approach to address this challenging multi-modal problem by combining a referring image segmentation method with a text-guided diffusion model. Specifically, we propose a conditional classifier-free guidance scheme to better guide the diffusion process along the direction from the referring expression to the target prompt. In addition, we provide a new localized ranking method and further improvements to make the generated edits more robust. Experimental results show that the proposed framework can serve as a simple but strong baseline for referring object manipulation. Also, comparisons with several baseline text-guided diffusion models demonstrate the effectiveness of our conditional classifier-free guidance technique.

Cite

CITATION STYLE

APA

Choi, M. (2022). Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13696 LNCS, pp. 627–643). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20059-5_36

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free