Sign up & Download
Sign in

A unified approach to building hybrid recommender systems

by Asela Gunawardana, Christopher Meek
Proceedings of the 2009 ACM RecSys (2009)

Abstract

Content-based recommendation systems can provide recommendations for "cold-start" items for which little or no training data is available, but typically have lower accuracy than collaborative filtering systems. Conversely, collaborative filtering techniques often provide accurate recommendations, but fail on cold start items. Hybrid schemes attempt to combine these different kinds of information to yield better recommendations across the board.

Cite this document (BETA)

Available from portal.acm.org
Page 1
hidden

A unified approach to building hybrid recommender systems

A Unified Approach to Building Hybrid Recommender
Systems
Asela Gunawardana
Microsoft Research
One Microsoft Way
Redmond, WA 98052
aselag@microsoft.com
Christopher Meek
Microsoft Research
One Microsoft Way
Redmond, WA 98052
meek@microsoft.com
ABSTRACT
Content-based recommendation systems can provide recom-
mendations for “cold-start” items for which little or no train-
ing data is available, but typically have lower accuracy than
collaborative filtering systems. Conversely, collaborative fil-
tering techniques often provide accurate recommendations,
but fail on cold start items. Hybrid schemes attempt to
combine these different kinds of information to yield better
recommendations across the board.
We describe unified Boltzmann machines, which are prob-
abilistic models that combine collaborative and content in-
formation in a coherent manner. They encode collaborative
and content information as features, and then learn weights
that reflect how well each feature predicts user actions. In
doing so, information of different types is automatically wei-
ghted, without the need for careful engineering of features or
for post-hoc hybridization of distinct recommender systems.
We present empirical results in the movie and shopping
domains showing that unified Boltzmann machines can be
used to combine content and collaborative information to
yield results that are competitive with collaborative tech-
nique in recommending items that have been seen before,
and also effective at recommending cold-start items.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information
Search and Retrieval—information filtering;G.3[Proba-
bility and Statistics]: correlation and regression analysis ;
I.2.6 [Artificial Intelligence]: Learning—parameter learn-
ing
General Terms
Algorithms, Performance
Keywords
recommender systems, collaborative filtering, content-based
filtering, cold start, Boltzmann machines
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
RecSys’09, October 23–25, 2009, New York, New York, USA.
Copyright 2009 ACM 978-1-60558-435-5/09/10 ...$10.00.
1. INTRODUCTION
Recommender systems suggest items of interest to users
based on available information such as previous usage pat-
terns, the usage patterns of other users, and features of the
items themselves [20]. Collaborative filtering techniques pro-
vide recommendations to a user by using the preferences of
other users that have similar preferences to him [5, 17, 28].
For example, the familiar Amazon item-to-item system [17]
recommends items for a user viewing a current item on the
basis of other items purchased by other users that have
viewed the current item. Such systems have the drawback
that they suffer from the item cold start problem—an item
cannot be recommended until it has been rated number of
existing users.
This problem can be alleviated by recommender systems
that use information about the content of items. This in-
formation can be meta-data about the item such as actors
appearing in movies for movie recommendation or informa-
tion derived from the item such as counts of words in docu-
ments in the case of document recommendation [21]. Unfor-
tunately, purely content-based approaches often do not per-
form as well as collaborative filtering approaches when cold-
start is not an issue [8]. Hybrid systems [6] that combine
collaborative and content information are therefore used.
We extend our previous work on tied Boltzmann machines
[10], yielding a model that can naturally combine content
and collaborative features, which we term unified Boltzmann
machines. Both are improvements upon Boltzmann ma-
chines, which model the joint distribution of a set of binary
variables through their pairwise interactions. In our con-
text, the binary variables indicate whether or not a user has
acted on each item of interest. Thus, we use Boltzmann
machines to learn weights that reflect the importance of dif-
ferent pairwise interactions such as “people who buy item A
also buy item B” in explaining usage data. Tied Boltzmann
machines are variants of Boltzmann machines that explain
such usage data using content features of the form “people
who buy one dairy item also buy other dairy items.” Because
they do not use features that explicitly model interactions
between individual items, they provide the same predictions
for items that share the same content information, poten-
tially reducing accuracy. Unified Boltzmann machines make
use of both kinds of information. They learn weights that
reflect how much each feature contributes in explaining us-
age data, and in doing so they automatically learn to bal-
ance and combine the different information sources to make
better predictions. We present empirical results comparing
the performance of untied, tied, and unified Boltzmann ma-
117
Page 2
hidden
chines in both cold start and non-cold-start scenarios in the
MovieLens and Ta-Feng shopping [13] data sets. We show
that the unified Boltzmann machine provides a single unified
approach that naturally combines content and collaborative
features to yield competitive results in both cold-start and
non-cold-start scenarios.
2. PREDICTING USER ACTIONS
In this paper, we examine the problem of predicting the
user’s future actions on items given the user’s past actions
on items. We will restrict attention to binary actions such
as listening to a song, watching a movie, or buying a book.
Extending our models to more general actions such as rating
a movie with an integer between 1 and 5 or spending $3.99
on rice is left for future work.
Our goal is to predict the probability that a user will act
on an item i given whether or not he has acted on all other
items. In other words, we wish to obtain a good estimate
of p(a
i
|a−i)wherea is a column vector of binary variables
a
i
that specify whether or not a user has acted on an item
i,anda−i is shorthand for {a
i
′ |i

= i}.Welearnmodels
of the interactions between actions on items from the action
vectors a
(u)
of a population of users u =1, ··· ,M.For
example, the models will learn that people who buy hot dogs
may also buy hot dog buns, or that people who watch the
movie Sleepless In Seattle may also watch the movie You’ve
Got Mail. Given such models, as yet unused items i can be
recommended if they have high p(a
(u)
i
|a
(u)
−i
).
2.1 Modeling Item Interactions with
Boltzmann Machines
Instead of estimating separate conditional distributions
p(a
(u)
i
|a
(u)
−i
) for all items i, we model the joint distribution
of a user’s action vector a by a Boltzmann machine:
p(a; λ)=
1
z(λ)
exp

X
i
λ
i
a
i
+
X
i<j
λ
ij
a
i
a
j
!
(1)
The parameter vector λ has components λ
i
and λ
ij
corre-
sponding to all items i and item pairs (i, j). We use
P
i<j
to
denote the sum over unordered pairs (i, j). Since λ
ij
is only
defined for i<j, we abuse notation somewhat and use λ
ij
and λ
ji
interchangeably to denote the weight associated with
the unordered pair (i, j) without introducing any ambiguity.
The partition function z(λ) is a normalizer that ensures that
p(a; λ) is properly normalized over all configurations of the
action vector a. The pairwise weights λ
ij
capture pairwise
collaborative effects—a high value of λ
ij
indicates that users
with a
i
= 1 also tend to have a
j
= 1. The per-item weights
λ
i
capture popularity effects—a high value of λ
i
indicates a
higher likelihood of a
i
= 1 irrespective of the user’s other
actions
Example 1. Let us restrict attention to an inventory of
three movies: Sleepless in Seattle, You’ve Got Mai l,and
When Harry Met Sally.Thevariablesa
SS
, a
YGM and aWHMS
will represent whether the user watched each of these movies
respectively. The Boltzmann machine relating these vari-
ables will have a parameter vector λ consisting of 3 unary
weights λ
SS
, ···, λWHMS and 3 pairwise weights λSS,Y GM ,
···, λ
YGM,WHMS.AhighvalueofλSS would indicate that
a
SS
tends to be 1—i.e. that many users watch Sleepless
in Seattle.Ahighvalueofλ
SS,Y GM would indicate that
a
SS
and a
YGM are correlated—i.e. that users who watch
Sleepless in Seattle also tend to watch You’ve Got Mai l .
Note that the partition function z(λ)isexpensivetocom-
pute [15]. It requires summing over all 2
N
configurations of
the action vector. However, our desired conditional proba-
bility of a user acting on a single item given his actions on
all other items is easy to compute, and is given by
p(a
i
=1|a−i; λ)=
exp

λ
i
+
P
j =i|a
j
=1
λ
ij

1+exp

λ
i
+
P
j =i|a
j
=1
λ
ij

. (2)
This conditional distribution can be seen to be a logistic
regression of a
i
on a−i. Because the logistic regressions for
predicting each a
i
come from the single joint model given
by the Boltzmann machine (1), the weight for regressing a
i
on a
j
and the weight for regressing a
j
on a
i
are constrained
to be equal for every item pair (i, j).
2.2 Content-Based Parameter Tying
One weakness of the model discussed in section 2.1 is that
reliable estimation of the model parameters λ,inparticu-
lar the pairwise weights λ
ij
, requires sufficiently many ob-
servations. Since real data tends to be sparse, with many
items having low probability of occurrence, pairwise weights
involving these items are difficult to estimate in practice.
Thus, the Boltzmann machine model described above may
not make good predictions for such items. In the extreme
case where an item is not seen during training, we have the
item cold start problem, where none of the weights associ-
ated with that item can be reliably estimated.
We alleviate this difficulty through the use of tied Boltz-
mann machines. Tied Boltzmann machines are Boltzmann
machines where the parameters have been tied so that they
are no longer estimated independently. When the parame-
ters are tied, the data used to estimated them can be pooled,
allowing more reliable estimation.
In order to retain the ability to model the interactions
between items, we use content information to guide the pa-
rameter tying. We assume that the content associated with
item i is represented by a feature vector f
(i)
∈ R
D
com-
posed of D components. For example, features could be
counts or TF-IDF weights of words in documents, or binary
flags indicating whether specific actors appeared in a movie.
Features with different semantics could be combined in a
single vector. For example, some feature components could
correspond to actors in a movie, while others could corre-
spond to genres, while still others could take on numerical
values such as movie length in minutes. We constrain λ to
satisfy
λ
i
= µ
T
f
(i)
(3)
λ
ij
= f
(i)
T
ηf
(j)
(4)
where µ ∈ R
D
and η ∈ R
D×D
is symmetric. In this pa-
per, we will only consider diagonal η, in order to reduce the
number of parameters that need be estimated, and use η
k
to denote the kth diagonal component of η.
Example 2. Let us tie the parameters of Example 1 using
a 2-dimensional feature vector with the components fMR
and f
TH corresponding to Meg Ryan and Tom Hanks ap-
pearing in a movie. The new parameters that correspond to
118

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

3 Readers on Mendeley
by Discipline
 
by Academic Status
 
33% Ph.D. Student
 
33% Researcher (at a non-Academic Institution)
 
33% Associate Professor
by Country
 
33% Denmark
 
33% Belgium
 
33% Spain