Abstract
Existing relation extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents. However, a large quantity of relational facts in knowledge bases can only be inferred across documents in practice. In this work, we present the problem of cross-document RE, making an initial step towards knowledge acquisition in the wild. To facilitate the research, we construct the first human-annotated cross-document RE dataset CodRED. Compared to existing RE datasets, CodRED presents two key challenges: Given two entities, (1) it requires finding the relevant documents that can provide clues for identifying their relations; (2) it requires reasoning over multiple documents to extract the relational facts. We conduct comprehensive experiments to show that CodRED is challenging to existing RE methods including strong BERT-based models. We make CodRED and the code for our baselines publicly available at https://github.com/thunlp/CodRED.
Cite
CITATION STYLE
Yao, Y., Du, J., Lin, Y., Li, P., Liu, Z., Zhou, J., & Sun, M. (2021). CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 4452–4472). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.366
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.