Crowdsourcing a Wikipedia vandalism corpus

54Citations
Citations of this article
87Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon's Mechanical Turk. The corpus compiles 32 452 edits on 28468 Wikipedia articles, among which 2 391 vandalism edits have been identified. 753 human annotators cast a total of 193 022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as "regular" or "vandalism." The corpus is available free of charge. © 2010 ACM.

Cite

CITATION STYLE

APA

Potthast, M. (2010). Crowdsourcing a Wikipedia vandalism corpus. In SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 789–790). https://doi.org/10.1145/1835449.1835617

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free