Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER

19Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The challenge posed by multimodal named entity recognition (MNER) is mainly two-fold: (1) bridging the semantic gap between text and image and (2) matching the entity with its associated object in image. Existing methods fail to capture the implicit entity-object relations, due to the lack of corresponding annotation. In this paper, we propose a bidirectional generative alignment method named BGA-MNER to tackle these issues. Our BGA-MNER consists of image2text and text2image generation with respect to entity-salient content in two modalities. It jointly optimizes the bidirectional reconstruction objectives, leading to aligning the implicit entity-object relations under such direct and powerful constraints. Furthermore, image-text pairs usually contain unmatched components which are noisy for generation. A stage-refined context sampler is proposed to extract the matched cross-modal content for generation. Extensive experiments on two benchmarks demonstrate that our method achieves state-of-the-art performance without image input during inference.

Cite

CITATION STYLE

APA

Chen, F., Liu, J., Ji, K., Ren, W., Wang, J., & Chen, J. (2023). Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER. In MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia (pp. 4555–4563). Association for Computing Machinery, Inc. https://doi.org/10.1145/3581783.3612095

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free