Visual relation extraction via multi-modal translation embedding based model

Zhichao Li; Yuping Han; Yajing Xu; Sheng Gao

Conference Proceedings

Visual relation extraction via multi-modal translation embedding based model

Li Z
Han Y
Xu Y
et al.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10937 LNAI 538-548

DOI: 10.1007/978-3-319-93034-3_43

0Citations

10Readers

Get full text

Abstract

Visual relation, such as “person holds dog” is an effective semantic unit for image understanding, as well as a bridge to connect computer vision and natural language. Recent work has been proposed to extract the object features in the image with the aid of respective textual description. However, very little work has been done to combine the multi-modal information to model the subject-predicate-object relation triplets to obtain deeper scene understanding. In this paper, we propose a novel visual relation extraction model named Multi-modal Translation Embedding Based Model to integrate the visual information and respective textual knowledge base. For that, our proposed model places objects of the image as well as their semantic relationships in two different low-dimensional spaces where the relation can be modeled as a simple translation vector to connect the entity descriptions in the knowledge graph. Moreover, we also propose a visual phrase learning method to capture the interactions between objects of the image to enhance the performance of visual relation extraction. Experiments are conducted on two real world datasets, which show that our proposed model can benefit from incorporating the language information into the relation embeddings and provide significant improvement compared to the state-of-the-art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, Z., Han, Y., Xu, Y., & Gao, S. (2018). Visual relation extraction via multi-modal translation embedding based model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10937 LNAI, pp. 538–548). Springer Verlag. https://doi.org/10.1007/978-3-319-93034-3_43

Visual relation extraction via multi-modal translation embedding based model

Abstract

Author supplied keywords

Cite

Register to see more suggestions