CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild

32Citations
Citations of this article
88Readers
Mendeley users who have this article in their library.

Abstract

Existing relation extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents. However, a large quantity of relational facts in knowledge bases can only be inferred across documents in practice. In this work, we present the problem of cross-document RE, making an initial step towards knowledge acquisition in the wild. To facilitate the research, we construct the first human-annotated cross-document RE dataset CodRED. Compared to existing RE datasets, CodRED presents two key challenges: Given two entities, (1) it requires finding the relevant documents that can provide clues for identifying their relations; (2) it requires reasoning over multiple documents to extract the relational facts. We conduct comprehensive experiments to show that CodRED is challenging to existing RE methods including strong BERT-based models. We make CodRED and the code for our baselines publicly available at https://github.com/thunlp/CodRED.

Cite

CITATION STYLE

APA

Yao, Y., Du, J., Lin, Y., Li, P., Liu, Z., Zhou, J., & Sun, M. (2021). CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 4452–4472). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.366

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free