Bootstrapped nDCG Estimation in the Presence of Unjudged Documents

3Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Retrieval studies often reuse TREC collections after the corresponding tracks have passed. Yet, a fair evaluation of new systems that retrieve documents outside the original judgment pool is not straightforward. Two common ways of dealing with unjudged documents are to remove them from a ranking (condensed lists), or to treat them as non- or highly relevant (naïve lower and upper bounds). However, condensed list-based measures often overestimate the effectiveness of a system, and naïve bounds are often very “loose”—especially for nDCG when some top-ranked documents are unjudged. As a new alternative, we employ bootstrapping to generate a distribution of nDCG scores by sampling judgments for the unjudged documents using run-based and/or pool-based priors. Our evaluation on four TREC collections with real and simulated cases of unjudged documents shows that bootstrapped nDCG scores yield more accurate predictions than condensed lists, and that they are able to strongly tighten upper bounds at a negligible loss of accuracy.

Cite

CITATION STYLE

APA

Fröbe, M., Gienapp, L., Potthast, M., & Hagen, M. (2023). Bootstrapped nDCG Estimation in the Presence of Unjudged Documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13980 LNCS, pp. 313–329). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-28244-7_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free