Semi-supervised Multimodal Coreference Resolution in Image Narrations

6Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i.e., a narration is paired with an image. This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large annotated training sets. To tackle these challenges, we present a data efficient semi-supervised approach that utilizes image-narration pairs to resolve coreferences and narrative grounding in a multimodal context. Our approach incorporates losses for both labeled and unlabeled data within a cross-modal framework. Our evaluation shows that the proposed approach outperforms strong baselines both quantitatively and qualitatively, for the tasks of coreference resolution and narrative grounding.

Cite

CITATION STYLE

APA

Goel, A., Fernando, B., Keller, F., & Bilen, H. (2023). Semi-supervised Multimodal Coreference Resolution in Image Narrations. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 11067–11081). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.682

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free