Abstract
Evaluation metrics play an important role in the batch evaluation of IR systems. Based on a user model that describes how users interact with the rank list, an evaluation metric is defined to link the relevance scores of a list of documents to an estimation of system effectiveness and user satisfaction. Therefore, the validity of an evaluation metric has two facets: whether the underlying user model can accurately predict user behavior and whether the evaluation metric correlates well with user satisfaction. While a tremendous amount of work has been undertaken to design, evaluate, and compare different evaluation metrics, few studies have explored the consistency between these two facets of evaluation metrics. Specifically, we want to investigate whether the metrics that are well calibrated with user behavior data can perform as well in estimating user satisfaction. To shed light on this research question, we compare the performance of various metrics with the C/W/L Framework in estimating user satisfaction when they are optimized to fit observed user behavior. Experimental results on both self-collected and public available user search behavior datasets show that the metrics optimized to fit users' click behavior can perform as well as those calibrated with user satisfaction feedback. We also investigate the reliability in the calibration process of evaluation metrics to find out how much data is required for parameter tuning. Our findings provide empirical support for the consistency between user behavior modeling and satisfaction measurement, as well as guidance for tuning the parameters in evaluation metrics.
Author supplied keywords
Cite
CITATION STYLE
Zhang, F., Mao, J., Liu, Y., Xie, X., Ma, W., Zhang, M., & Ma, S. (2020). Models Versus Satisfaction: Towards a Better Understanding of Evaluation Metrics. In SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 379–388). Association for Computing Machinery, Inc. https://doi.org/10.1145/3397271.3401162
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.