Tolerance of effectiveness measures to relevance judging errors

4Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Crowdsourcing relevance judgments for test collection construction is attractive because the practice has the possibility of being more affordable than hiring high quality assessors. A problem faced by all crowdsourced judgments - even judgments formed from the consensus of multiple workers - is that there will be differences in the judgments compared to the judgments produced by high quality assessors. For two TREC test collections, we simulated errors in sets of judgments and then measured the effect of these errors on effectiveness measures. We found that some measures appear to be more tolerant of errors than others. We also found that to achieve high rank correlation in the ranking of retrieval systems requires conservative judgments for average precision (AP) and nDCG, while precision at rank 10 requires neutral judging behavior. Conservative judging avoids mistakenly judging non-relevant documents as relevant at the cost of judging some relevant documents as non-relevant. In addition, we found that while conservative judging behavior maximizes rank correlation for AP and nDCG, to minimize the error in the measures' values requires more liberal behavior. Depending on the nature of a set of crowdsourced judgments, the judgments may be more suitable with some effectiveness measures than others, and the use of some effectiveness measures will require higher levels of judgment quality than others. © 2014 Springer International Publishing Switzerland.

Cite

CITATION STYLE

APA

Li, L., & Smucker, M. D. (2014). Tolerance of effectiveness measures to relevance judging errors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8416 LNCS, pp. 148–159). Springer Verlag. https://doi.org/10.1007/978-3-319-06028-6_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free