Sign up & Download
Sign in

Interest-based personalized search

by Zhongming Ma, Gautam Pant, Olivia R Liu Sheng
ACM Transactions on Information Systems (2007)

Abstract

Web search engines typically provide search results without considering user interests or context. We propose a personalized search approach that can easily extend a conventional search engine on the client side. Our mapping framework automatically maps a set of known user interests onto a group of categories in the Open Directory Project (ODP) and takes advantage of manually edited data available in ODP for training text classifiers that correspond to, and therefore categorize and personalize search results according to user interests. In two sets of controlled experiments, we compare our personalized categorization system (PCAT) with a list interface system (LIST) that mimics a typical search engine and with a nonpersonalized categorization system (CAT). In both experiments, we analyze system performances on the basis of the type of task and query length. We find that PCAT is preferable to LIST for information gathering types of tasks and for searches with short queries, and PCAT outperforms CAT in both information gathering and finding types of tasks, and for searches associated with free-form queries. From the subjects' answers to a questionnaire, we find that PCAT is perceived as a system that can find relevant Web pages quicker and easier than LIST and CAT.

Cite this document (BETA)

Available from portal.acm.org
Page 1
hidden

Interest-based personalized search

Interest-Based Personalized Search
ZHONGMING MA, GAUTAM PANT, and OLIVIA R. LIU SHENG
The University of Utah
Web search engines typically provide search results without considering user interests or context.
We propose a personalized search approach that can easily extend a conventional search engine on
the client side. Our mapping framework automatically maps a set of known user interests onto a
group of categories in the Open Directory Project (ODP) and takes advantage of manually edited
data available in ODP for training text classifiers that correspond to, and therefore categorize and
personalize search results according to user interests. In two sets of controlled experiments, we
compare our personalized categorization system (PCAT) with a list interface system (LIST) that
mimics a typical search engine and with a nonpersonalized categorization system (CAT). In both
experiments, we analyze system performances on the basis of the type of task and query length. We
find that PCAT is preferable to LIST for information gathering types of tasks and for searches with
short queries, and PCAT outperforms CAT in both information gathering and finding types of tasks,
and for searches associated with free-form queries. From the subjects’ answers to a questionnaire,
we find that PCAT is perceived as a system that can find relevant Web pages quicker and easier
than LIST and CAT.
Categories and Subject Descriptors: H.3.1 [Information Storage and Retrieval]: Content Anal-
ysis and Indexing—Dictionaries, Linguistic processing; H.3.3 [Information Storage and Re-
trieval]: Information Search and Retrieval—Search process; H.3.4 [Information Storage and
Retrieval]: Systems and Software—Performance evaluation (efficiency and effectiveness); H.5.2
[Information Interfaces and Presentation]: User Interfaces—Graphical user interfaces (GUI)
General Terms: Algorithms, Performance
Additional Key Words and Phrases: Personalized search, user interest, user interface, World Wide
Web, information retrieval, Open Directory
ACM Reference Format:
Ma, Z., Pant, G., and Liu Sheng, O. R. 2007. Interest-based personalized search. ACM Trans.
Inform. Syst. 25, 1, Article 5 (February 2007), 38 pages. DOI = 10.1145/1198296.1198301
http://doi.acm.org/10.1145/1198296.1198301.
This research was supported by Global Knowledge Management Center and the School of Ac-
counting and Information Systems at the University of Utah, and eBusiness Research Center at
Pennsylvania State University.
Authors’ address: Zhongming Ma, Gautam Pant, and Olivia R. Liu Sheng, School of Accounting
and Information Systems, The University of Utah, UT 84112; email: {zhongming.ma,gautam.pant,
olivia.sheng}@business.utah.edu.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is
granted without fee provided that copies are not made or distributed for profit or direct commercial
advantage and that copies show this notice on the first page or initial screen of a display along
with the full citation. Copyrights for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,
to redistribute to lists, or to use any component of this work in other works requires prior specific
permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn
Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org.

2007 ACM 1046-8188/2007/02-ART5 $5.00 DOI 10.1145/1198296.1198301 http://doi.acm.org/
10.1145/1198296.1198301
ACM Transactions on Information Systems, Vol. 25, No. 1, Article 5, Publication date: February 2007.
Page 2
hidden
2•
Z. Ma et al.
1. INTRODUCTION
The Web provides an extremely large and dynamic source of information, and
the continuous creation and updating of Web pages magnifies information over-
load on the Web. Both casual and noncasual users (e.g., knowledge workers)
often use search engines to find a needle in this constantly growing haystack.
Sellen et al. [2002], who define a knowledge worker as someone “whose paid
work involves significant time spent in gathering, finding, analyzing, creating,
producing or archiving information”, report that 59% of the tasks performed
on the Web by a sample of knowledge workers fall into the categories of in-
formation gathering and finding, which require an active use of Web search
engines.
Most existing Web search engines return a list of search results based on
a user’s query but ignore the user’s specific interests and/or search context.
Therefore, the identical query from different users or in different contexts will
generate the same set of results displayed in the same way for all users, a so
called one-size-fits-all [Lawrence 2000] approach. Furthermore, the number of
search results returned by a search engine is often so large that the results must
be partitioned into multiple result pages. In addition, individual differences
in information needs, polysemy (multiple meanings of the same word), and
synonymy (multiple words with same meaning) pose problems [Deerwester
et al. 1990] in that a user may have to go through many irrelevant results or try
several queries before finding the desired information. Problems encountered in
searching are exaggerated further when the search engine users employ short
queries [Jansen et al. 1998]. However, personalization techniques that put a
search in the context of the user’s interests may alleviate some of these issues.
In this study, which focuses on knowledge workers’ search for information
online in a workplace setting, we assume that some information about the
knowledge workers, such as their professional interests and skills, is known to
the employing organization and can be extracted automatically with an infor-
mation extraction (IE) tool or with database queries. The organization can then
use such information as an input to a system based on our proposed approach
and provide knowledge workers with a personalized search tool that will reduce
their search time and boost their productivity.
For a given query, a personalized search can provide different results for
different users or organize the same results differently for each user. It can be
implemented on either the server side (search engine) or the client side (or-
ganization’s intranet or user’s computer). Personalized search implemented on
the server side is computationally expensive when millions of users are us-
ing the search engine, and it also raises privacy concerns when information
about users is stored on the server. A personalized search on the client side
can be achieved by query expansion and/or result processing [Pitkow et al.
2002]. By adding extra query terms associated with user interests or search
context, the query expansion approach can retrieve different sets of results.
The result processing includes result filtering, such as removal of some re-
sults, and reorganizing, such as reranking, clustering, and categorizing the
results.
ACM Transactions on Information Systems, Vol. 25, No. 1, Article 5, Publication date: February 2007.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

31 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
35% Ph.D. Student
 
26% Student (Master)
 
10% Assistant Professor
by Country
 
16% China
 
10% Spain
 
10% India