Emergent: A novel data-set for stance classification

William Ferreira; Andreas Vlachos

Conference ProceedingsOPEN ACCESS

Emergent: A novel data-set for stance classification

2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (2016) 1163-1168

DOI: 10.18653/v1/n16-1138

328Citations

340Readers

Abstract

We present Emergent, a novel data-set derived from a digital journalism project for rumour debunking. The data-set contains 300 rumoured claims and 2,595 associated news articles, collected and labelled by journalists with an estimation of their veracity (true, false or unverified). Each associated article is summarized into a headline and labelled to indicate whether its stance is for, against, or observing the claim, where observing indicates that the article merely repeats the claim. Thus, Emergent provides a real-world data source for a variety of natural language processing tasks in the context of fact-checking. Further to presenting the dataset, we address the task of determining the article headline stance with respect to the claim. For this purpose we use a logistic regression classifier and develop features that examine the headline and its agreement with the claim. The accuracy achieved was 73% which is 26% higher than the one achieved by the Excitement Open Platform (Magnini et al., 2014).

Cite

CITATION STYLE

APA

Ferreira, W., & Vlachos, A. (2016). Emergent: A novel data-set for stance classification. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (pp. 1163–1168). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n16-1138

Emergent: A novel data-set for stance classification

Abstract

Cite

Register to see more suggestions