A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites
- ISSN: 10414347
- DOI: 10.1109/TKDE.2007.190667
Abstract
In this paper, we present a complete framework and findings in mining Web usage patterns from Web log files of a real Web site that has all the challenging aspects of real-life Web usage mining, including evolving user profiles and external data describing an ontology of the Web content. Even though the Web site under study is part of a nonprofit organization that does not "sell" any products, it was crucial to understand "who" the users were, "what" they looked at, and "how their interests changed with time," all of which are important questions in Customer Relationship Management (CRM). Hence, we present an approach for discovering and tracking evolving user profiles. We also describe how the discovered user profiles can be enriched with explicit information need that is inferred from search queries extracted from Web log data. Profiles are also enriched with other domain-specific information facets that give a panoramic view of the discovered mass usage modes. An objective validation strategy is also used to assess the quality of the mined profiles, in particular their adaptability in the face of evolving user behavior.
Author-supplied keywords
A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites
Evolving User Profiles in Dynamic Web Sites
Olfa Nasraoui, Member, IEEE, Maha Soliman, Member, IEEE, Esin Saka, Member, IEEE,
Antonio Badia, Member, IEEE, and Richard Germain
Abstract—In this paper, we present a complete framework and findings in mining Web usage patterns from Web log files of a real Web
site that has all the challenging aspects of real-life Web usage mining, including evolving user profiles and external data describing an
ontology of the Web content. Even though the Web site under study is part of a nonprofit organization that does not “sell” any products,
it was crucial to understand “who” the users were, “what” they looked at, and “how their interests changed with time,” all of which are
important questions in Customer Relationship Management (CRM). Hence, we present an approach for discovering and tracking
evolving user profiles. We also describe how the discovered user profiles can be enriched with explicit information need that is inferred
from search queries extracted from Web log data. Profiles are also enriched with other domain-specific information facets that give a
panoramic view of the discovered mass usage modes. An objective validation strategy is also used to assess the quality of the mined
profiles, in particular their adaptability in the face of evolving user behavior.
Index Terms—Mining evolving clickstreams, user profiles, Web usage mining, semantic Web mining.
Ç
1 INTRODUCTION
CUSTOMER Relationship Management (CRM) can use datafrom within and outside an organization to allow an
understanding of its customers on an individual basis or on
a group basis such as by forming customer profiles. An
improved understanding of the customer’s habits, needs,
and interests can allow the business to profit by, for
instance, “cross selling” or selling items related to the ones
that the customer wants to purchase. Hence, reliable
knowledge about the customers’ preferences and needs
forms the basis for effective CRM. As businesses move
online, the competition between businesses to keep the
loyalty of their old customers and to lure new customers is
even more important, since a competitor’s Web site may be
only one click away. The fast pace and large amounts of
data available in these online settings have recently made it
imperative to use automated data mining or knowledge
discovery techniques to discover Web user profiles. These
different modes of usage or the so-called mass user profiles
can be discovered using Web usage mining techniques that
can automatically extract frequent access patterns from the
history of previous user clickstreams stored in Web log files.
These profiles can later be harnessed toward personalizing
the Web site to the user or to support targeted marketing.
Although there have been considerable advances in Web
usage mining, there have been no detailed studies present-
ing a fully integrated approach to mine a real Web site with
the challenging characteristics of today’s Web sites, such as
evolving profiles, dynamic content, and the availability of
taxonomy or databases in addition to Web logs.
In this paper, we present a complete framework and a
summary of our experience in mining Web usage patterns
with real-world challenges such as evolving access
patterns, dynamic pages, and external data describing an
ontology of the Web content and how it relates to the
business actors (in the case of the studied Web site, the
companies, contractors, consultants, etc., in corrosion). The
Web site in this study is a portal that provides access to
news, events, resources, company information (such as
companies or contractors supplying related products and
services), and a library of technical and regulatory
documentation related to corrosion and surface treatment.
The portal also offers a virtual meeting place between
companies or organizations seeking information about
other companies or organizations. Without loss of general-
ity, in the rest of this paper, we will refer to all the Web
site participants (organizations, contractors, consultants,
agencies, corporations, centers, agencies, etc.) simply as
companies. The Web site in our study is managed by a
nonprofit organization that does not sell anything but only
provides free information that is ideally complete, accu-
rate, and up to date. Hence, it was crucial to understand
the different modes of usage and to know what kind of
information the visitors seek and read on the Web site and
how this information evolves with time. For this reason,
we perform clustering of the user sessions extracted from
the Web logs to partition the users into several homo-
geneous groups with similar activities and then extract
user profiles from each cluster as a set of relevant URLs.
This procedure is repeated in subsequent new periods of
Web logging (such as biweekly), then the previously
202 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 2, FEBRUARY 2008
. O. Nasraoui, M. Soliman, E. Saka, and A. Badia are with the Department
of Computer Engineering and Computer Science, Speed School of
Engineering, University of Louisville, Louisville, KY 40292.
E-mail: {olfa.nasraoui, masoli01, esin.saka, abadia}@louisville.edu.
. R. Germain is with the College of Business, University of Louisville,
154 College of Business, Louisville, KY 40292.
E-mail: richard.germain@louisville.edu.
Manuscript received 21 Feb. 2006; revised 12 Oct. 2006; accepted 10 Aug.
2007; published online 4 Sept. 2007.
For information on obtaining reprints of this article, please send e-mail to:
tkde@computer.org, and reference IEEECS Log Number TKDE-0088-0206.
Digital Object Identifier no. 10.1109/TKDE.2007.190667.
1041-4347/08/$25.00 2008 IEEE Published by the IEEE Computer Society
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime



