Evaluating Relevance Judgments with Pairwise Discriminative Power

Zhumin Chu; Jiaxin Mao; Fan Zhang; Yiqun Liu; Tetsuya Sakai; Min Zhang; Shaoping Ma

Conference ProceedingsOPEN ACCESS

Evaluating Relevance Judgments with Pairwise Discriminative Power

International Conference on Information and Knowledge Management, Proceedings (2021) 261-270

DOI: 10.1145/3459637.3482428

2Citations

7Readers

Get full text

Abstract

Relevance judgments play an essential role in the evaluation of information retrieval systems. As many different relevance judgment settings have been proposed in recent years, an evaluation metric to compare relevance judgments in different annotation settings has become a necessity. Traditional metrics, such as , Krippendorff's α and φ have mainly focused on the inter-assessor consistency to evaluate the quality of relevance judgments. They encounter "reliable but useless"problem when employed to compare different annotation settings (e.g. binary judgment v.s. 4-grade judgment). Meanwhile, other existing popular metrics such as discriminative power (DP) are not designed to compare relevance judgments across different annotation settings, they therefore suffer from limitations, such as the requirement of result ranking lists from different systems. Therefore, how to design an evaluation metric to compare relevance judgments under different grade settings needs further investigation. In this work, we propose a novel metric named pairwise discriminative power (PDP) to evaluate the quality of relevance judgment collections. By leveraging a small amount of document-level preference tests, PDP estimates the discriminative ability of relevance judgments on separating ranking lists with various qualities. With comprehensive experiments on both synthetic and real-world datasets, we show that PDP maintains a high degree of consistency with annotation quality in various grade settings. Compared with existing metrics (e.g., Krippendorff's α, φ, DP, etc), it provides reliable evaluation results with affordable additional annotation efforts.

Author supplied keywords

Cite

CITATION STYLE

APA

Chu, Z., Mao, J., Zhang, F., Liu, Y., Sakai, T., Zhang, M., & Ma, S. (2021). Evaluating Relevance Judgments with Pairwise Discriminative Power. In International Conference on Information and Knowledge Management, Proceedings (pp. 261–270). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482428

Evaluating Relevance Judgments with Pairwise Discriminative Power

Abstract

Author supplied keywords

Cite

Register to see more suggestions