ParsFEVER: A Dataset for Farsi Fact Extraction and Verification

2Citations
Citations of this article
49Readers
Mendeley users who have this article in their library.

Abstract

Training and evaluation of automatic fact extraction and verification techniques require large amounts of annotated data which might not be available for low-resource languages. This paper presents ParsFEVER: The first publicly available Farsi dataset for fact extraction and verification. We adopt the construction procedure of the standard English dataset for the task, i.e., FEVER, and improve it for the case of low-resource languages. Specifically, claims are extracted from sentences that are carefully selected to be more informative. The dataset comprises nearly 23K manually-annotated claims. Over 65% of the claims in ParsFEVER are many-hop (require evidence from multiple sources), making the dataset a challenging benchmark (only 13% of the claims in FEVER are many-hop). Also, despite having a smaller training set (around one-ninth of that in Fever), a model trained on ParsFEVER attains similar downstream performance, indicating the quality of the dataset. We release the dataset and the annotation guidelines at https://github. com/Zarharan/ParsFEVER.

Cite

CITATION STYLE

APA

Zarharan, M., Ghaderan, M., Pourdabiri, A., Sayedi, Z., Minaei-Bidgoli, B., Eetemadi, S., & Pilehvar, M. T. (2021). ParsFEVER: A Dataset for Farsi Fact Extraction and Verification. In *SEM 2021 - 10th Conference on Lexical and Computational Semantics, Proceedings of the Conference (pp. 99–104). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.starsem-1.9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free