MAF: A general matching and alignment framework for multimodal named entity recognition

79Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we study multimodal named entity recognition in social media posts. Existing works mainly focus on using a cross-modal attention mechanism to combine text representation with image representation. However, they still suffer from two weaknesses: (1) the current methods are based on a strong assumption that each text and its accompanying image are matched, and the image can be used to help identify named entities in the text. However, this assumption is not always true in real scenarios, and the strong assumption may reduce the recognition effect of theMNER model; (2) the current methods fail to construct a consistent representation to bridge the semantic gap between two modalities, which prevents the model from establishing a good connection between the text and image. To address these issues, we propose a general matching and alignment framework (MAF) for multimodal named entity recognition in social media posts. Specifically, to solve the first issue, we propose a novel cross-modal matching (CM) module to calculate the similarity score between text and image, and use the score to determine the proportion of visual information that should be retained. To solve the second issue, we propose a novel cross-modal alignment (CA) module to make the representations of the two modalities more consistent. We conduct extensive experiments, ablation studies, and case studies to demonstrate the effectiveness and efficiency of our method.The source code of this paper can be found in https://github.com/xubodhu/MAF.

References Powered by Scopus

Deep residual learning for image recognition

174876Citations
N/AReaders
Get full text

Neural architectures for named entity recognition

2584Citations
N/AReaders
Get full text

End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF

1787Citations
N/AReaders
Get full text

Cited by Powered by Scopus

MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding

44Citations
N/AReaders
Get full text

Joint Multimodal Entity-Relation Extraction Based on Edge-Enhanced Graph Alignment Network and Word-Pair Relation Tagging

41Citations
N/AReaders
Get full text

Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition

35Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Xu, B., Huang, S., Sha, C., & Wang, H. (2022). MAF: A general matching and alignment framework for multimodal named entity recognition. In WSDM 2022 - Proceedings of the 15th ACM International Conference on Web Search and Data Mining (pp. 1215–1223). Association for Computing Machinery, Inc. https://doi.org/10.1145/3488560.3498475

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 4

67%

Professor / Associate Prof. 1

17%

Researcher 1

17%

Readers' Discipline

Tooltip

Computer Science 5

71%

Agricultural and Biological Sciences 1

14%

Engineering 1

14%

Save time finding and organizing research with Mendeley

Sign up for free