On the Reliability of Test Collections for Evaluating Systems of Different Types

Emine Yilmaz; Nick Craswell; Bhaskar Mitra; Daniel Campos

Conference ProceedingsOPEN ACCESS

On the Reliability of Test Collections for Evaluating Systems of Different Types

SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020) 2101-2104

DOI: 10.1145/3397271.3401317

10Citations

15Readers

Get full text

Abstract

As deep learning based models are increasingly being used for information retrieval, a major challenge is to ensure the availability of test collections for measuring their quality. Test collections are usually generated based on pooling results of various retrieval systems, but until recently this did not include deep learning systems. This raises a major challenge for reusable evaluation: Since deep learning based models use external resources (e.g. word embeddings) and advanced representations when compared to traditional methods, they may return different types of relevant document that were not identified in the original pooling. If so, test collections constructed using traditional methods could lead to biased and unfair evaluation results for deep learning systems. This paper uses simulated pooling to test the fairness and reusability of test collections, showing that especially when shallow pools (e.g. depth-10 pools) are used, pooling based on traditional systems only may lead to biased evaluation of deep learning systems.

Author supplied keywords

Cite

CITATION STYLE

APA

Yilmaz, E., Craswell, N., Mitra, B., & Campos, D. (2020). On the Reliability of Test Collections for Evaluating Systems of Different Types. In SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2101–2104). Association for Computing Machinery, Inc. https://doi.org/10.1145/3397271.3401317

On the Reliability of Test Collections for Evaluating Systems of Different Types

Abstract

Author supplied keywords

Cite

Register to see more suggestions