Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

253Citations
Citations of this article
75Readers
Mendeley users who have this article in their library.

Abstract

Multi-modal named entity recognition (MNER) aims to discover named entities in free text and classify them into predefined types with images. However, dominant MNER models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have the potential to refine multi-modal representation learning. To deal with this issue, we propose a unified multi-modal graph fusion (UMGF) approach for MNER. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). Then, we stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, we achieve an attentionbased multi-modal representation for each word and perform entity labeling with a CRF decoder. Experimentation on the two benchmark datasets demonstrates the superiority of our MNER model.

Cite

CITATION STYLE

APA

Zhang, D., Wei, S., Li, S., Wu, H., Zhu, Q., & Zhou, G. (2021). Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 16, pp. 14347–14355). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i16.17687

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free