A Hierarchical Network for Multimodal Document-Level Relation Extraction

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Document-level relation extraction aims to extract entity relations that span across multiple sentences. This task faces two critical issues: long dependency and mention selection. Prior works address the above problems from the textual perspective, however, it is hard to handle these problems solely based on text information. In this paper, we leverage video information to provide additional evidence for understanding long dependencies and offer a wider perspective for identifying relevant mentions, thus giving rise to a new task named Multimodal Document-level Relation Extraction (MDocRE). To tackle this new task, we construct a human-annotated dataset including documents and relevant videos, which, to the best of our knowledge, is the first document-level relation extraction dataset equipped with video clips. We also propose a hierarchical framework to learn interactions between different dependency levels and a textual-guided transformer architecture that incorporates both textual and video modalities. In addition, we utilize a mention gate module to address the mention-selection problem in both modalities. Experiments on our proposed dataset show that 1) incorporating video information greatly improves model performance; 2) our hierarchical framework has state-of-the-art results compared with both unimodal and multimodal baselines; 3) through collaborating with video information, our model better solves the long-dependency and mention-selection problems.

References Powered by Scopus

ViViT: A Video Vision Transformer

1431Citations
N/AReaders
Get full text

Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling

469Citations
N/AReaders
Get full text

Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling

250Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Kong, L., Wang, J., Ma, Z., Zhou, Q., Zhang, J., He, L., & Chen, J. (2024). A Hierarchical Network for Multimodal Document-Level Relation Extraction. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, pp. 18408–18416). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i16.29801

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 2

100%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 1

50%

Computer Science 1

50%

Save time finding and organizing research with Mendeley

Sign up for free