Bootstrapped nDCG Estimation in the Presence of Unjudged Documents

Maik Fröbe; Lukas Gienapp; Martin Potthast; Matthias Hagen

Conference Proceedings

Bootstrapped nDCG Estimation in the Presence of Unjudged Documents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2023) 13980 LNCS 313-329

DOI: 10.1007/978-3-031-28244-7_20

3Citations

2Readers

Get full text

Abstract

Retrieval studies often reuse TREC collections after the corresponding tracks have passed. Yet, a fair evaluation of new systems that retrieve documents outside the original judgment pool is not straightforward. Two common ways of dealing with unjudged documents are to remove them from a ranking (condensed lists), or to treat them as non- or highly relevant (naïve lower and upper bounds). However, condensed list-based measures often overestimate the effectiveness of a system, and naïve bounds are often very “loose”—especially for nDCG when some top-ranked documents are unjudged. As a new alternative, we employ bootstrapping to generate a distribution of nDCG scores by sampling judgments for the unjudged documents using run-based and/or pool-based priors. Our evaluation on four TREC collections with real and simulated cases of unjudged documents shows that bootstrapped nDCG scores yield more accurate predictions than condensed lists, and that they are able to strongly tighten upper bounds at a negligible loss of accuracy.

Cite

CITATION STYLE

APA

Fröbe, M., Gienapp, L., Potthast, M., & Hagen, M. (2023). Bootstrapped nDCG Estimation in the Presence of Unjudged Documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13980 LNCS, pp. 313–329). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-28244-7_20

Bootstrapped nDCG Estimation in the Presence of Unjudged Documents

Abstract

Cite

Register to see more suggestions