Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

Dong Zhang; Suzhong Wei; Shoushan Li; Hanqian Wu; Qiaoming Zhu; Guodong Zhou

Conference ProceedingsOPEN ACCESS

Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

35th AAAI Conference on Artificial Intelligence, AAAI 2021 (2021) 16 14347-14355

DOI: 10.1609/aaai.v35i16.17687

253Citations

75Readers

Abstract

Multi-modal named entity recognition (MNER) aims to discover named entities in free text and classify them into predefined types with images. However, dominant MNER models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have the potential to refine multi-modal representation learning. To deal with this issue, we propose a unified multi-modal graph fusion (UMGF) approach for MNER. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). Then, we stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, we achieve an attentionbased multi-modal representation for each word and perform entity labeling with a CRF decoder. Experimentation on the two benchmark datasets demonstrates the superiority of our MNER model.

Cite

CITATION STYLE

APA

Zhang, D., Wei, S., Li, S., Wu, H., Zhu, Q., & Zhou, G. (2021). Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 16, pp. 14347–14355). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i16.17687

Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

Abstract

Cite

Register to see more suggestions