Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER

Feng Chen; Jiajia Liu; Kaixiang Ji; Wang Ren; Jian Wang; Jingdong Chen

Conference ProceedingsOPEN ACCESS

Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER

MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia (2023) 4555-4563

DOI: 10.1145/3581783.3612095

19Citations

6Readers

Get full text

Abstract

The challenge posed by multimodal named entity recognition (MNER) is mainly two-fold: (1) bridging the semantic gap between text and image and (2) matching the entity with its associated object in image. Existing methods fail to capture the implicit entity-object relations, due to the lack of corresponding annotation. In this paper, we propose a bidirectional generative alignment method named BGA-MNER to tackle these issues. Our BGA-MNER consists of image2text and text2image generation with respect to entity-salient content in two modalities. It jointly optimizes the bidirectional reconstruction objectives, leading to aligning the implicit entity-object relations under such direct and powerful constraints. Furthermore, image-text pairs usually contain unmatched components which are noisy for generation. A stage-refined context sampler is proposed to extract the matched cross-modal content for generation. Extensive experiments on two benchmarks demonstrate that our method achieves state-of-the-art performance without image input during inference.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, F., Liu, J., Ji, K., Ren, W., Wang, J., & Chen, J. (2023). Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER. In MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia (pp. 4555–4563). Association for Computing Machinery, Inc. https://doi.org/10.1145/3581783.3612095

Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER

Abstract

Author supplied keywords

Cite

Register to see more suggestions