Offline Evaluation by Maximum Similarity to an Ideal Ranking

Charles L.A. Clarke; Mark D. Smucker; Alexandra Vtyurina

Conference ProceedingsOPEN ACCESS

Offline Evaluation by Maximum Similarity to an Ideal Ranking

International Conference on Information and Knowledge Management, Proceedings (2020) 225-234

DOI: 10.1145/3340531.3411915

18Citations

12Readers

Get full text

Abstract

NDCG and similar measures remain standard for the offline evaluation of search, recommendation, question answering and similar systems. These measures require definitions for two or more relevance levels, which human assessors then apply to judge individual documents. Due to this dependence on a definition of relevance, it can be difficult to extend these measures to account for factors beyond relevance. Rather than propose extensions to these measures, we instead propose a radical simplification to replace them. For each query, we define a set of ideal rankings and compute the maximum rank similarity between members of this set and an actual ranking generated by a system. This maximum similarity to an ideal ranking becomes our effectiveness measure, replacing NDCG and similar measures. We propose rank biased overlap (RBO) to compute this rank similarity, since it was specifically created to address the requirements of rank similarity between search results. As examples, we explore ideal rankings that account for document length, diversity, and correctness.

Author supplied keywords

Cite

CITATION STYLE

APA

Clarke, C. L. A., Smucker, M. D., & Vtyurina, A. (2020). Offline Evaluation by Maximum Similarity to an Ideal Ranking. In International Conference on Information and Knowledge Management, Proceedings (pp. 225–234). Association for Computing Machinery. https://doi.org/10.1145/3340531.3411915

Offline Evaluation by Maximum Similarity to an Ideal Ranking

Abstract

Author supplied keywords

Cite

Register to see more suggestions