Directed Diffusion: Direct Control of Object Placement through Attention Guidance

3Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.

Abstract

Text-guided diffusion models such as DALL-E 2, Imagen, eDiff-I, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images are of very high quality. However, these models often struggle to compose scenes containing several key objects such as characters in specified positional relationships. The missing capability to "direct"the placement of characters and objects both within and across images is crucial in storytelling, as recognized in the literature on film and animation theory. In this work, we take a particularly straightforward approach to provide the needed direction. Drawing on the observation that the cross-attention maps for prompt words reflect the spatial layout of objects denoted by those words, we introduce an optimization objective that produces "activation"at desired positions in these cross-attention maps. The resulting approach is a step toward generalizing the applicability of text-guided diffusion models beyond single images to collections of related images, as in storybooks. Directed Diffusion provides easy high-level positional control over multiple objects, while making use of an existing pre-trained model and maintaining a coherent blend between the positioned objects and the background. Moreover, it requires only a few lines to implement.

Cite

CITATION STYLE

APA

Ma, W. D. K., Lahiri, A., Lewis, J. P., Leung, T., & Kleijn, W. B. (2024). Directed Diffusion: Direct Control of Object Placement through Attention Guidance. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, pp. 4098–4106). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i5.28204

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free