Better effectiveness metrics for SErps, cards, and rankings

4Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Offline metrics for IR evaluation are often derived from a user model that seeks to capture the interaction between the user and the ranking, conflating the interaction with a ranking of documents with the user’s interaction with the search results page. A desirable property of any effectiveness metric is if the scores it generates over a set of rankings correlate well with the “satisfaction” or “goodness" scores attributed to those same rankings by a population of searchers. Using data from a large-scale web search engine, we find that offline effectiveness metrics do not correlate well with a behavioural measure of satisfaction that can be inferred from user activity logs. We then examine three mechanisms to improve the correlation: tuning the model parameters; improving the label coverage, so that more kinds of item are labelled and hence included in the evaluation; and modifying the underlying user models that describe the metrics. In combination, these three mechanisms transform a wide range of common metrics into “card-aware” variants which allow for the gain from cards (or snippets), varying probabilities of clickthrough, and good abandonment.

Cite

CITATION STYLE

APA

Thomas, P., Moffat, A., Bailey, P., Scholer, F., & Craswell, N. (2018). Better effectiveness metrics for SErps, cards, and rankings. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3291992.3292002

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free