Relationformer: A Unified Framework for Image-to-Graph Generation

Suprosanna Shit; Rajat Koner; Bastian Wittmann; Johannes Paetzold; Ivan Ezhov; Hongwei Li; Jiazhen Pan; Sahand Sharifzadeh; Georgios Kaissis; Volker Tresp; Bjoern Menze

Conference Proceedings

Relationformer: A Unified Framework for Image-to-Graph Generation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13697 LNCS 422-439

DOI: 10.1007/978-3-031-19836-6_24

7Citations

39Readers

Get full text

Abstract

A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach’s effectiveness and generalizability. (Code is available at https://github.com/suprosanna/relationformer ).

Author supplied keywords

Cite

CITATION STYLE

APA

Shit, S., Koner, R., Wittmann, B., Paetzold, J., Ezhov, I., Li, H., … Menze, B. (2022). Relationformer: A Unified Framework for Image-to-Graph Generation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13697 LNCS, pp. 422–439). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-19836-6_24

Relationformer: A Unified Framework for Image-to-Graph Generation

Abstract

Author supplied keywords

Cite

Register to see more suggestions