Offline Evaluation by Maximum Similarity to an Ideal Ranking

18Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

NDCG and similar measures remain standard for the offline evaluation of search, recommendation, question answering and similar systems. These measures require definitions for two or more relevance levels, which human assessors then apply to judge individual documents. Due to this dependence on a definition of relevance, it can be difficult to extend these measures to account for factors beyond relevance. Rather than propose extensions to these measures, we instead propose a radical simplification to replace them. For each query, we define a set of ideal rankings and compute the maximum rank similarity between members of this set and an actual ranking generated by a system. This maximum similarity to an ideal ranking becomes our effectiveness measure, replacing NDCG and similar measures. We propose rank biased overlap (RBO) to compute this rank similarity, since it was specifically created to address the requirements of rank similarity between search results. As examples, we explore ideal rankings that account for document length, diversity, and correctness.

Cite

CITATION STYLE

APA

Clarke, C. L. A., Smucker, M. D., & Vtyurina, A. (2020). Offline Evaluation by Maximum Similarity to an Ideal Ranking. In International Conference on Information and Knowledge Management, Proceedings (pp. 225–234). Association for Computing Machinery. https://doi.org/10.1145/3340531.3411915

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free