While research on scientific claim verification has led to the development of powerful systems that appear to approach human performance, these approaches have yet to be tested in a realistic setting against large corpora of scientific literature. Moving to this open-domain evaluation setting, however, poses unique challenges; in particular, it is infeasible to exhaustively annotate all evidence documents. In this work, we present SCIFACT-OPEN, a new test collection designed to evaluate the performance of scientific claim verification systems on a corpus of 500K research abstracts. Drawing upon pooling techniques from information retrieval, we collect evidence for scientific claims by pooling and annotating the top predictions of four state-of-the-art scientific claim verification models. We find that systems developed on smaller corpora struggle to generalize to SCIFACT-OPEN, exhibiting performance drops of at least 15 F1. In addition, analysis of the evidence in SCIFACT-OPEN reveals interesting phenomena likely to appear when claim verification systems are deployed in practice, e.g., cases where the evidence supports only a special case of the claim. Our dataset is available at https://github.com/dwadden/scifact-open.
CITATION STYLE
Wadden, D., Lo, K., Kuehl, B., Cohan, A., Beltagy, I., Wang, L. L., & Hajishirzi, H. (2022). SCIFACT-OPEN: Towards open-domain scientific claim verification. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 4748–4763). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.347
Mendeley helps you to discover research relevant for your work.