Anyone offering content in a digital library is naturally interested in assessing its performance: how well does my system meet the users' information needs? Standard evaluation benchmarks have been developed in information retrieval that can be used to test retrieval effectiveness. However, these generic benchmarks focus on a single document genre, language, media-type, and searcher stereotype that is radically different from the unique content and user community of a particular digital library. This paper proposes to derive a domain-specific test collection from readily available interaction data in search logs files that captures the domain-specificity of digital libraries. We use as case study an archival institution's complete search log that spans over multiple years, and derive a large-scale test collection. We manually derive a set of topics judged by human experts-based on a set of e-mail reference questions and responses from archivists-and use this for validation. Our main finding is that we can derive a reliable and domain-specific test collection from search log files. © 2010 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Zhang, J., & Kamps, J. (2010). A search log-based approach to evaluation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6273 LNCS, pp. 248–260). https://doi.org/10.1007/978-3-642-15464-5_26
Mendeley helps you to discover research relevant for your work.