Test collections and evaluation metrics based on graded relevance

Kalervo Järvelin

Conference Proceedings

Test collections and evaluation metrics based on graded relevance

Järvelin K

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7536 LNCS 280-294

DOI: 10.1007/978-3-642-40087-2_27

1Citations

2Readers

Get full text

Abstract

In modern large information retrieval (IR) environments, the number of documents relevant to a request may easily exceed the number of documents a user is willing to examine. Therefore it is desirable to rank highly relevant documents first in search results. To develop retrieval methods for this purpose requires evaluating retrieval methods accordingly. However, the most IR method evaluations are based on rather liberal and binary relevance assessments. Therefore differences between sloppy and excellent IR methods may not be observed in evaluation. An alternative is to employ graded relevance assessments in evaluation. The present paper discusses graded relevance, test collections providing graded assessments, evaluation metrics based on graded relevance assessments. We shall also examine the effects of using graded relevance assessments in retrieval evaluation, and some evaluation results based on graded relevance. We find that graded relevance provides new insight into IR phenomena and affects the relative merits of IR methods.

Cite

CITATION STYLE

APA

Järvelin, K. (2013). Test collections and evaluation metrics based on graded relevance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7536 LNCS, pp. 280–294). https://doi.org/10.1007/978-3-642-40087-2_27

Test collections and evaluation metrics based on graded relevance

Abstract

Cite

Register to see more suggestions