Sign up & Download
Sign in

A Single Strong Disagreement Ruins a Recommender : Improving Recommendation Accuracy with a Simple Statistic

by Jennifer Golbeck
Human-Computer Interaction ()

Abstract

Research on the use of social trust relationships for collaborative filtering has shown that trust-based recommendations can outperform traditional methods in certain cases. This, in turn, lead to insights that tie trust to certain more subtle types of similarity between users which is not captured in the overall similarity measures normally used for making recommendations. In this study, we investigate the use these trust-inspired nuanced similarity measures directly for making recommendations. After describing previous research that identified these similarity statistics, we present an experiment run on two data sets: FilmTrust and MovieLens. Our results show that using a simple measure - the single largest difference between users - as a weight produces significantly more accurate results than a traditional collaborative filtering algorithm and in some cases also outperforms a model-based approach.

Cite this document (BETA)

Available from Jen Golbeck's profile on Mendeley.
Page 1
hidden

A Single Strong Disagreement Ruin...

A Single Strong Disagreement Ruins a Recommender: Improving Recommendation Accuracy with a Simple Statistic Jennifer Golbeck Human-Computer Interaction Lab University of Maryland, College Park, MD jgolbeck@umd.edu ABSTRACT Research on the use of social trust relationships for col- laborative filtering has shown that trust-based recom- mendations can outperform traditional methods in cer- tain cases. This, in turn, lead to insights that tie trust to certain more subtle types of similarity between users which is not captured in the overall similarity measures normally used for making recommendations. In this study, we investigate the use these trust-inspired nu- anced similarity measures directly for making recom- mendations. After describing previous research that identified these similarity statistics, we present an ex- periment run on two data sets: FilmTrust and Movie- Lens. Our results show that using a simple measure - the single largest difference between users - as a weight produces significantly more accurate results than a tra- ditional collaborative filtering algorithm and in some cases also outperforms a model-based approach. Author Keywords recommender systems, collaborative filtering, profile similarity, trust ACM Classification Keywords H.3.4 Information Storage and Retrieval: Systems and Software - Performance Evaluation (efficiency and ef- fectiveness) INTRODUCTION Recommender systems rely on computing similarity, be it between people or items, to make recommendations. In this research, we take results from the literature on computing with social trust, and attempt to im- prove the quality of recommendations by using more nuanced similarity measures. We show particularly that the largest difference between users and a user���s rating habits can improve the accuracy of predictive ratings. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CHI 2009, April 4 - 9, 2009, Boston, Massachusetts, USA. Copyright 2009 ACM 978-1-60558-246-7/09/04...$5.00. Table 1. Example movie ratings on a 1-5 scale from three hypothetical users Allison Ben Catherine 1 Wizard of Oz 5 1 5 2 Gigli 1 5 1 3 Star Wars 4 4 2 4 Vertigo 4 4 5 5 High Noon 3 3 4 6 Over the Hedge 4 4 2 7 Goodfellas 2 2 4 8 Forest Gump 3 3 5 9 Clockwork Orange 2 2 3 10 Singin��� in the Rain 4 4 3 Over the past 5 years, trust derived from social networks has received much attention as a method for computing recommendations [9, 19, 21, 34]. Several experiments have shown how trust can be used for this purpose and cases where it outperforms more traditional collabora- tive filtering algorithms. This previous work has addressed the relationship be- tween trust and similarity. Certainly, we expect that if Allison highly trusts Ben, Ben is likely to be more sim- ilar to Allison than someone she does not trust. This has been confirmed in experiments [32, 33]. More recent work has shown that trust captures some- thing more nuanced than similarity alone [1]. For example, consider the situation where Allison has rated a set of movies, and we choose a subset of ten that includes Allison���s favorite movie, least favorite movie, and eight other films that Allison has seen and rated but about which Allison has no strong opinion. Now if Ben and Catherine rate those ten films, Allison can make a judgment from this about how much she trusts each of them about movies. Consider the ratings in table 1. Ben gives the lowest possible rating to User Allison���s favorite movie and the highest rating to Allison���s least favorite movie, but they agree perfectly on the other eight. In this situation, similarity is very high. Contrast this with Catherine���s
Page 2
hidden
ratings. She agrees perfectly on Allison���s favorite and least favorite films, but there is variation in their rat- ings of the other eight movies such that overall, their similarity is lower than in the first case. Who should Allison trust more about movies? Previous research has shown that Allison tends to trust Ben more since they agree on the favorite and least favorite movies [1]. This inspired a study we completed as previous work [1] that empirically showed that trust is related to the following factors between users: ��� Overall similarity, as is used in traditional user-user recommender systems ��� Similarity on items to which the user has given an extreme rating, as in the example above. In this ex- ample, Allison has two movies with extreme ratings. Agreeing with her on those leads to higher trust ��� The largest difference between the users. This in- volves finding the one item where users have the largest disagreement. For example, in table 1, the largest difference between Allison and Ben is 4 on the Wizard of Oz and Gigli the largest difference be- tween Allison and Catherine is 2 on Star Wars and several other movies). The larger this difference is, the lower the resulting trust is ��� The individual���s propensity to trust. Some users are more trusting than others. This leads to the following question, which is the hy- pothesis of this paper: if trust can be used effectively to make recommendations, and we have identified a set of nuanced similarity measures that reflect trust, can we use those measures directly to make accurate rec- ommendations? Essentially, we are replacing a social expression of trust with an approximation drawn from similarity on the underlying data. To test this, we draw on the results of our previous work to compute these similarity measures between all pairs of users in two data sets - FilmTrust and MovieLens - and then compute predictive recommendations which are compared to the users��� known ratings. We show in both data sets, the very simple statistic measuring the single largest difference between users outperforms correlation-based collaborative filtering techniques, and may also perform better than a well known model-based approach. We begin by presenting a summary of the results of our previous study that identified the nuanced similarity measures that relate to trust. Then, we describe the ex- perimental methodology and data sets, and present our results. Because it is somewhat surprising that such a simple measure performs so well, we analyze the differ- ences in performance between the maximum difference and standard correlation measures to gain insights into why it performs better. Finally, we conclude with a discussion of how these measures can be incorporated into working recommender systems and the future work required. BACKGROUND With the explosion of social networking websites, a wealth of publicly available information about people���s relationships has become available. This has, in turn, led to the development of methods for estimating rela- tionships and using them to improve the functionality of applications. Social trust has been of particular interest to the research community, and the most common ap- plication using trust is recommender systems [9, 21, 19, 31]. These results have shown that trust can provide significant benefits over traditional user-based collabo- rative filtering algorithms in certain cases. Those re- sults suggest that trust captures something more than similarity alone. In this section, I describe the results of a previous study we conducted that identified sev- eral nuanced similarity measures that can be used to estimate social trust relationships. These results led directly to the hypothesis for the new experiments con- ducted in this article, where the nuanced similarity mea- sures are used directly to make recommendations. We conducted an experiment to delve further into the question of how trust and similarity relate, and the re- sults are reported in [1]. It took place in the context of a movie rating website. First, subjects were asked to rate all the movies they had seen on a list of nearly 300 diverse films. The set was composed of the top 100 movies from the Internet Movie Database 1 top gross- ing, top rated, and worst rated films, as well as top 10 films from each genre. Those ratings were then used to generate profiles of hy- pothetical users. Each profile consisted of ten movies where the hypothetical user differed from the subject in controlled ways. In particular, we tested the impact of differences on movies the subject had given extremely high or low ratings (values of 1, 2, 9, or 10 on a 1- 10 scale), ratings more than two standard deviations from the average, and movies that fit both categories. In a profile, movies 1 - 4 were chosen from one cate- gory, and movies 5-10 came from the complementary category. So, for example, a profile would contain four movies that the subject had rated in the extreme and six movies that had non-extreme ratings. Another ex- ample profile could have four movies where the subject���s ratings were were within two standard deviations of the mean and six movies where the subject���s ratings were outside that range. In the main part of this experiment, a predefined set of differences were applied to generate the hypothetical user���s ratings. These created profiles that were different from the user in small, medium, and large amounts. The users were asked to rate how much they trusted this hypothetical user based on the ratings. This method of generating ratings controlled for all factors - overall 1http://imdb.com

Authors on Mendeley

Readership Statistics

6 Readers on Mendeley
by Discipline
 
by Academic Status
 
33% Researcher (at a non-Academic Institution)
 
17% Student (Bachelor)
 
17% Ph.D. Student
by Country
 
50% United States
 
17% Sweden
 
17% Brazil

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in