We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon's Mechanical Turk. The corpus compiles 32 452 edits on 28468 Wikipedia articles, among which 2 391 vandalism edits have been identified. 753 human annotators cast a total of 193 022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as "regular" or "vandalism." The corpus is available free of charge. © 2010 ACM.
CITATION STYLE
Potthast, M. (2010). Crowdsourcing a Wikipedia vandalism corpus. In SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 789–790). https://doi.org/10.1145/1835449.1835617
Mendeley helps you to discover research relevant for your work.