Compared with the widely studied Human-Object Interaction DE-Tection (HOI-DET), no effort has been devoted to its inverse problem, i.e. to generate an HOI scene image according to the given relationship triplet, to our best knowledge. We term this new task "Human-Object Interaction Image Generation"(HOI-IG). HOI-IG is a research-worthy task with great application prospects, such as online shopping, film production and interactive entertainment. In this work, we introduce an Interact-GAN to solve this challenging task. Our method is composed of two stages: (1) manipulating the posture of a given human image conditioned on a predicate. (2) merging the transformed human image and object image to one realistic scene image while satisfying the ir expected relative position and ratio. Besides, to address the large spatial misalignment issue caused by fusing two images content with reasonable spatial layout, we propose a Relation-based Spatial Transformer Network (RSTN) to adaptively process the images conditioned on their interaction. Extensive experiments on two challenging datasets demonstrate the effectiveness and superiority of our approach. We advocate for the image generation community to draw more attention to the new Human-Object Interaction Image Generation problem. To facilitate future research, our project will be released at: http://colalab.org/projects/InteractGAN.
CITATION STYLE
Gao, C., Liu, S., Zhu, D., Liu, Q., Cao, J., He, H., … Yan, S. (2020). InteractGAN: Learning to Generate Human-Object Interaction. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 165–173). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3413854
Mendeley helps you to discover research relevant for your work.