Take a Fresh Look at Recommender Systems from an Evaluation Standpoint

27Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recommendation has become a prominent area of research in the field of Information Retrieval (IR). Evaluation is also a traditional research topic in this community. Motivated by a few counter-intuitive observations reported in recent studies, this perspectives paper takes a fresh look at recommender systems from an evaluation standpoint. Rather than examining metrics like recall, hit rate, or NDCG, or perspectives like novelty and diversity, the key focus here is on how these metrics are calculated when evaluating a recommender algorithm. Specifically, the commonly used train/test data splits and their consequences are re-examined. We begin by examining common data splitting methods, such as random split or leave-one-out, and discuss why the popularity baseline is poorly defined under such splits. We then move on to explore the two implications of neglecting a global timeline during evaluation: data leakage and oversimplification of user preference modeling. After-wards, we present new perspectives on recommender systems, including techniques for evaluating algorithm performance that more accurately reflect real-world scenarios, and possible approaches to consider decision contexts in user preference modeling.

Cite

CITATION STYLE

APA

Sun, A. (2023). Take a Fresh Look at Recommender Systems from an Evaluation Standpoint. In SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2629–2638). Association for Computing Machinery, Inc. https://doi.org/10.1145/3539618.3591931

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free