Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance

Myungsub Choi

Conference Proceedings

Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance

Choi M

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13696 LNCS 627-643

DOI: 10.1007/978-3-031-20059-5_36

1Citations

9Readers

Get full text

Abstract

We introduce the problem of referring object manipulation (ROM), which aims to generate photo-realistic image edits regarding two textual descriptions: 1) a text referring to an object in the input image and 2) a text describing how to manipulate the referred object. A successful ROM model would enable users to simply use natural language to manipulate images, removing the need for learning sophisticated image editing software. We present one of the first approach to address this challenging multi-modal problem by combining a referring image segmentation method with a text-guided diffusion model. Specifically, we propose a conditional classifier-free guidance scheme to better guide the diffusion process along the direction from the referring expression to the target prompt. In addition, we provide a new localized ranking method and further improvements to make the generated edits more robust. Experimental results show that the proposed framework can serve as a simple but strong baseline for referring object manipulation. Also, comparisons with several baseline text-guided diffusion models demonstrate the effectiveness of our conditional classifier-free guidance technique.

Author supplied keywords

Cite

CITATION STYLE

APA

Choi, M. (2022). Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13696 LNCS, pp. 627–643). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20059-5_36

Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance

Abstract

Author supplied keywords

Cite

Register to see more suggestions