Controlling Style and Semantics in Weakly-Supervised Image Generation

Dario Pavllo; Aurelien Lucchi; Thomas Hofmann

Conference Proceedings

Controlling Style and Semantics in Weakly-Supervised Image Generation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12351 LNCS 482-499

DOI: 10.1007/978-3-030-58539-6_29

23Citations

92Readers

Get full text

Abstract

We propose a weakly-supervised approach for conditional image generation of complex scenes where a user has fine control over objects appearing in the scene. We exploit sparse semantic maps to control object shapes and classes, as well as textual descriptions or attributes to control both local and global style. In order to condition our model on textual descriptions, we introduce a semantic attention module whose computational cost is independent of the image resolution. To further augment the controllability of the scene, we propose a two-step generation scheme that decomposes background and foreground. The label maps used to train our model are produced by a large-vocabulary object detector, which enables access to unlabeled data and provides structured instance information. In such a setting, we report better FID scores compared to fully-supervised settings where the model is trained on ground-truth semantic maps. We also showcase the ability of our model to manipulate a scene on complex datasets such as COCO and Visual Genome.

Cite

CITATION STYLE

APA

Pavllo, D., Lucchi, A., & Hofmann, T. (2020). Controlling Style and Semantics in Weakly-Supervised Image Generation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12351 LNCS, pp. 482–499). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58539-6_29

Controlling Style and Semantics in Weakly-Supervised Image Generation

Abstract

Cite

Register to see more suggestions