Shared text collections continue to be vital infrastructure for IR research. The COVID-19 pandemic offered an opportunity to create a test collection that captured the rapidly changing information space during a pandemic, and the TREC-COVID effort was created to build such a collection using the TREC framework. This paper examines the quality of the resulting TREC-COVID test collections, and in doing so, offers a critique of the state-of-the-art in building reusable IR test collections. The largest of the collections - called 'TREC-COVID Complete' - is found to be on par with previous TREC adhoc collections with existing quality tests uncovering no apparent problems. Yet the lack of any way to definitively demonstrate the collection's quality and its violation of previously used quality heuristics suggest much work remains to be done to understand the factors affecting collection quality.
CITATION STYLE
Voorhees, E. M., & Roberts, K. (2021). On the Quality of the TREC-COVID IR Test Collections. In SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2422–2428). Association for Computing Machinery, Inc. https://doi.org/10.1145/3404835.3463244
Mendeley helps you to discover research relevant for your work.