Personalized search
Communications of the ACM (2002)
- ISSN: 00010782
- DOI: 10.1145/567498.567526
Available from linkinghub.elsevier.com
or
Abstract
Vorstellung des "Outride"-Systems zur Personalisierung von Suchanfragen aus dem Browser heraus und Vergleich mit herkömmlichen Suchmethoden.
Available from linkinghub.elsevier.com
Page 1
Personalized search
50 September 2002/Vol. 45, No. 9 COMMUNICATIONS OF THE ACM
Contextual computing refers to the enhance-
ment of a user’s interactions by understanding the
user, the context, and the applications and informa-
tion being used, typically across a wide set of user
goals. Contextual computing is not just about model-
ing user preferences and behavior or embedding
computation everywhere, it’s about actively adapting
the computational environment—for each and every
user—at each point of computation.
With respect to personalized search, the contex-
tual computing approach focuses on understanding
the information consumption patterns of each user,
the various information foraging strategies [3] and
applications they employ, and the nature of the
information itself. Focusing on the user enables a
shift from what we call “consensus relevancy” where
the computed relevancy for the entire population is
presumed relevant for each user, toward personal rel-
evancy where relevancy is computed based on each
individual within the context of their interactions.
The benefits of personalized search can be significant,
appreciably decreasing the time it takes people—
novices and experts alike—to find information.
Here, we review the evolution of the field of infor-
mation retrieval (IR) [4], setting the stage for examin-
ing how a search can be personalized, with particular
emphasis on the Web. We then describe the Outride
system, and review a set of experiments.
The field of IR has evolved from analyzing the let-
ters and words that make up the content of docu-
ments to the integration of intrinsic document prop-
erties like citations and hyperlinks to the incorpora-
tion of usage data. Content-based approaches such as
statistical and natural language techniques provide
results that contain a specific set of words or meaning,
but cannot differentiate which documents in a col-
lection are the ones really worth reading.
This need gave rise to a set of methods we refer to
as “author relevancy” techniques. By computing
what the most respected authors deem important,
citation and hyperlink approaches provide an
implicit measure of importance. However, these
techniques can create an authoring bias where the
meaning and resources valued by a group of
authors determine the results for the entire user
population. Imagine for a moment if the Java pro-
gramming language was called something different.
A query for the term “java” on the Web would pro-
duce a different set of results, likely about coffee,
which is probably closer to most users’ expectations.
Additionally, a ranking bias can occur when, for a
given topic, the authoring community values a dif-
ferent set of resources than the general population. A
typical example of this is link promotion where a set
of highly interconnected sites is created by a small set
of authors in an attempt to appear to become the
most relevant resources on a particular topic.
Usage-based IR methods add to the previous
research by leveraging the actions of users to compute
relevancy. A usage rank is computed from the fre-
A contextual computing approach
may prove a breakthrough in
personalized search efficiency.
PERSONALIZED
SEARCH
PE
TE
R
A
N
D
M
A
R
IA
H
O
EY
Contextual computing refers to the enhance-
ment of a user’s interactions by understanding the
user, the context, and the applications and informa-
tion being used, typically across a wide set of user
goals. Contextual computing is not just about model-
ing user preferences and behavior or embedding
computation everywhere, it’s about actively adapting
the computational environment—for each and every
user—at each point of computation.
With respect to personalized search, the contex-
tual computing approach focuses on understanding
the information consumption patterns of each user,
the various information foraging strategies [3] and
applications they employ, and the nature of the
information itself. Focusing on the user enables a
shift from what we call “consensus relevancy” where
the computed relevancy for the entire population is
presumed relevant for each user, toward personal rel-
evancy where relevancy is computed based on each
individual within the context of their interactions.
The benefits of personalized search can be significant,
appreciably decreasing the time it takes people—
novices and experts alike—to find information.
Here, we review the evolution of the field of infor-
mation retrieval (IR) [4], setting the stage for examin-
ing how a search can be personalized, with particular
emphasis on the Web. We then describe the Outride
system, and review a set of experiments.
The field of IR has evolved from analyzing the let-
ters and words that make up the content of docu-
ments to the integration of intrinsic document prop-
erties like citations and hyperlinks to the incorpora-
tion of usage data. Content-based approaches such as
statistical and natural language techniques provide
results that contain a specific set of words or meaning,
but cannot differentiate which documents in a col-
lection are the ones really worth reading.
This need gave rise to a set of methods we refer to
as “author relevancy” techniques. By computing
what the most respected authors deem important,
citation and hyperlink approaches provide an
implicit measure of importance. However, these
techniques can create an authoring bias where the
meaning and resources valued by a group of
authors determine the results for the entire user
population. Imagine for a moment if the Java pro-
gramming language was called something different.
A query for the term “java” on the Web would pro-
duce a different set of results, likely about coffee,
which is probably closer to most users’ expectations.
Additionally, a ranking bias can occur when, for a
given topic, the authoring community values a dif-
ferent set of resources than the general population. A
typical example of this is link promotion where a set
of highly interconnected sites is created by a small set
of authors in an attempt to appear to become the
most relevant resources on a particular topic.
Usage-based IR methods add to the previous
research by leveraging the actions of users to compute
relevancy. A usage rank is computed from the fre-
A contextual computing approach
may prove a breakthrough in
personalized search efficiency.
PERSONALIZED
SEARCH
PE
TE
R
A
N
D
M
A
R
IA
H
O
EY
Page 2
COMMUNICATIONS OF THE ACM September 2002/Vol. 45, No. 9 51
quency, recency, and/or duration of interaction by
users. This provides a direct measure of what is rele-
vant at any point in time to the users of the informa-
tion system. Typically, the usage rank for a page is
combined with content and link-based ranking meth-
ods. Although impossible to infer from link and con-
tent approaches, usage techniques readily compute
changes in relevancy over time. These temporal
changes include: ephemeral events such as record-
breaking usage driven by interest in the comet Shoe-
maker-Levy; emerging trends such as the growth in
usage of MP3s; seasonal favorites like the popularity
of flowers around Valentine’s Day; and faddish events
such as the rise and fall of the NCSA Mosaic Web
browser.
Interestingly, the retrieval process can be infused
with different granularities of usage data—individual,
group/social, and census—enabling a system to fall-
back to a coarser level of usage data in the face of
uncertainty. The latter forms create a kind of social
relevancy, where the notion of importance is defined
by the usage of a community of users. Very few usage-
based systems have been developed, with the most
notable exception being Direct Hit’s collaborative fil-
tering-inspired approach that monitors which search
results people select.
What’s curious about these approaches is that rele-
vance is measured as a function of the entire popula-
tion of users. One can view this as an attempt to
optimize the consensus relevancy for any given topic.
For any query, relevancy is computed identically for
all users without acknowledging that relevance is rela-
tive for each user. Further, none are able to differenti-
ate based upon who is searching, their current con-
text, interests, and/or prior knowledge. What’s needed
is a way to take into account that different people find
different things relevant and that people’s interests and
knowledge change over time. What’s needed is a way
to compute personal relevancy.
The Outride Approach
We posit that at least two different computational
techniques need to be combined to personalize search:
contextualization and individualization. By contextu-
alization, we mean the interrelated conditions that
occur within an activity. Individualization means the
totality of characteristics that distinguishes an individ-
ual. Contextualization includes factors like the nature
of information available, the information currently
being examined, the applications in use, when, and so
on. Individualization encompasses elements like the
user’s goals, prior and tacit knowledge, past informa-
tion-seeking behaviors, among others. These elements
are used to build a user model to personal relevancy
computationally, as we will describe. It is this focus on
the user and their context within the application of
search that makes personalized search a compelling
area to explore within the framework of contextual
computing.
It is worth mentioning upfront that since the fol-
lowing techniques alter the search experience, careful
integration of these features into the user interface is
required. In particular, the interface needs to provide a
way to explain what the system is doing to personalize
BY JAMES PITKOW, HINRICH SCHÜTZE, TODD CASS, ROB COOLEY, DON TURNBULL,
ANDY EDMONDS, EYTAN ADAR, AND THOMAS BREUEL{ {
THE MAGNITUDE OF THE DIFFERENCE BETWEEN THE OUTRIDE SYSTEM AND THE OTHER ENGINES IS COMPELLING,
E
S
P
E
C
IA
L
LY
G
IV
E
N
T
H
A
T
M
O
S
T
S
E
A
R
C
H
E
N
G
IN
E
S
A
R
E
L
E
S
S
T
H
A
N
10%
B
E
T
T
E
R
T
H
A
N
O
N
E
A
N
O
T
H
E
R.
quency, recency, and/or duration of interaction by
users. This provides a direct measure of what is rele-
vant at any point in time to the users of the informa-
tion system. Typically, the usage rank for a page is
combined with content and link-based ranking meth-
ods. Although impossible to infer from link and con-
tent approaches, usage techniques readily compute
changes in relevancy over time. These temporal
changes include: ephemeral events such as record-
breaking usage driven by interest in the comet Shoe-
maker-Levy; emerging trends such as the growth in
usage of MP3s; seasonal favorites like the popularity
of flowers around Valentine’s Day; and faddish events
such as the rise and fall of the NCSA Mosaic Web
browser.
Interestingly, the retrieval process can be infused
with different granularities of usage data—individual,
group/social, and census—enabling a system to fall-
back to a coarser level of usage data in the face of
uncertainty. The latter forms create a kind of social
relevancy, where the notion of importance is defined
by the usage of a community of users. Very few usage-
based systems have been developed, with the most
notable exception being Direct Hit’s collaborative fil-
tering-inspired approach that monitors which search
results people select.
What’s curious about these approaches is that rele-
vance is measured as a function of the entire popula-
tion of users. One can view this as an attempt to
optimize the consensus relevancy for any given topic.
For any query, relevancy is computed identically for
all users without acknowledging that relevance is rela-
tive for each user. Further, none are able to differenti-
ate based upon who is searching, their current con-
text, interests, and/or prior knowledge. What’s needed
is a way to take into account that different people find
different things relevant and that people’s interests and
knowledge change over time. What’s needed is a way
to compute personal relevancy.
The Outride Approach
We posit that at least two different computational
techniques need to be combined to personalize search:
contextualization and individualization. By contextu-
alization, we mean the interrelated conditions that
occur within an activity. Individualization means the
totality of characteristics that distinguishes an individ-
ual. Contextualization includes factors like the nature
of information available, the information currently
being examined, the applications in use, when, and so
on. Individualization encompasses elements like the
user’s goals, prior and tacit knowledge, past informa-
tion-seeking behaviors, among others. These elements
are used to build a user model to personal relevancy
computationally, as we will describe. It is this focus on
the user and their context within the application of
search that makes personalized search a compelling
area to explore within the framework of contextual
computing.
It is worth mentioning upfront that since the fol-
lowing techniques alter the search experience, careful
integration of these features into the user interface is
required. In particular, the interface needs to provide a
way to explain what the system is doing to personalize
BY JAMES PITKOW, HINRICH SCHÜTZE, TODD CASS, ROB COOLEY, DON TURNBULL,
ANDY EDMONDS, EYTAN ADAR, AND THOMAS BREUEL{ {
THE MAGNITUDE OF THE DIFFERENCE BETWEEN THE OUTRIDE SYSTEM AND THE OTHER ENGINES IS COMPELLING,
E
S
P
E
C
IA
L
LY
G
IV
E
N
T
H
A
T
M
O
S
T
S
E
A
R
C
H
E
N
G
IN
E
S
A
R
E
L
E
S
S
T
H
A
N
10%
B
E
T
T
E
R
T
H
A
N
O
N
E
A
N
O
T
H
E
R.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
15 Readers on Mendeley
by Discipline
by Academic Status
33% Ph.D. Student
13% Professor
13% Student (Master)
by Country
27% United States
13% United Kingdom
13% India


