Web usage mining using artificial ant colony clustering and linear genetic programming
- ISBN: 0780378040
- DOI: 10.1109/CEC.2003.1299832
Abstract
The rapid e-commerce growth has made both business community and customers face a new situation. Due to intense competition on one hand and the customer's option to choose from several alternatives business community has realized the necessity of intelligent marketing strategies and relationship management. Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web. Web usage mining has become very critical for effective Web site management, creating adaptive Web sites, business and support services, personalization, network traffic flow analysis and so on. The study of ant colonies behavior and their self-organizing capabilities is of interest to knowledge retrieval/management and decision support systems sciences, because it provides models of distributed adaptive organization, which are useful to solve difficult optimization, classification, and distributed control problems, among others. In this paper, we propose an ant clustering algorithm to discover Web usage patterns (data clusters) and a linear genetic programming approach to analyze the visitor trends. Empirical results clearly shows that ant colony clustering performs well when compared to a self-organizing map (for clustering Web usage patterns) even though the performance accuracy is not that efficient when comparared to evolutionary-fuzzy clustering (i-miner) approach.
Web usage mining using artificial ant colony clustering and linear genetic programming
and Genetic Programming
Ajith Abraham
Department of Computer Science,
Oklahoma State University, Tulsa, OK 74106, USA
aa@cs.okstate.edu
Vitorino Ramos
CVRM-GeoSystems Centre,
Technical University of Lisbon, Portugal
vitorino.ramos@alfa.ist.utl.pt
Abstract- The rapid e-commerce growth has made both
business community and customers face a new situation.
Due to intense competition on one hand and the
customer’s option to choose from several alternatives
business community has realized the necessity of
intelligent marketing strategies and relationship
management. Web usage mining attempts to discover
useful knowledge from the secondary data obtained from
the interactions of the users with the Web. Web usage
mining has become very critical for effective Web site
management, creating adaptive Web sites, business and
support services, personalization, network traffic flow
analysis and so on. The study of ant colonies behavior and
their self-organizing capabilities is of interest to
knowledge retrieval/ management and decision support
systems sciences, because it provides models of
distributed adaptive organization, which are useful to
solve difficult optimization, classification, and distributed
control problems, among others [17][18][16]. In this
paper, we propose an ant clustering algorithm to discover
Web usage patterns (data clusters) and a linear genetic
programming approach to analyze the visitor trends.
Empirical results clearly shows that ant colony
clustering performs well when compared to a self-
organizing map (for clustering Web usage patterns) even
though the performance accuracy is not that efficient
when comparared to evolutionary-fuzzy clustering (i-
miner) [1] approach.
1 Introduction
The WWW continues to grow at an amazing rate as an
information gateway and as a medium for conducting
business. Web mining is the extraction of interesting and
useful knowledge and implicit information from atrifacts or
activity related to the WWW [12][7]. Web servers record
and accumulate data about user interactions whenever
requests for resources are received. Analyzing the Web
access logs can help understand the user behaviour and
the web structure. From the business and applications
point of view, knowledge obtained from the Web usage
patterns could be directly applied to efficiently manage
activities related to e-business, e-services, e-education
and so on. Accurate Web usage information could help to
attract new customers, retain current customers, improve
cross marketing/sales, effectiveness of promotional
campaigns, tracking leaving customers and find the most
effective logical structure for their Web space. User
profiles could be built by combining users’ navigation
paths with other data features, such as page viewing time,
hyperlink structure, and page content [9]. What makes the
discovered knowledge interesting had been addressed by
several works. Results previously known are very often
considered as not interesting. So the key concept to make
the discovered knowledge interesting will be its novelty or
unexpectedness appearance.
There are several commercial softwares that could provide
Web usage statistics. These stats could be useful for Web
administrators to get a sense of the actual load on the
server. For small web servers, the usage statistics
provided by conventional Web site trackers may be
adequate to analyze the usage pattern and trends.
However as the size and complexity of the data increases,
the statistics provided by existing Web log file analysis
tools may prove inadequate and more intelligent mining
techniques will be necessary [10].
Web server
Raw Web log data
—
ISP
Customer / Client
Pattern analysis
Usage statistics
Pattern discovery
Data pre-processing
and cleaning
Business services
Web structure/content
information
Business intelligence
Sequential/association
rule mining
Figure 1. Web usage mining framework
A generic Web usage mining framework is depicted in
Figure 1. In the case of Web mining data could be
collected at the server level, client level, proxy level or
some consolidated data. These data could differ in terms
of content and the way it is collected etc. The usage data
collected at different sources represent the navigation
patterns of different segments of the overall Web traffic,
ranging from single user, single site browsing behaviour to
multi-user, multi-site access patterns. As evident from
Figure 1, Web server log does not accurately contain
sufficient information for infering the behaviour at the
client side as they relate to the pages served by the Web
server. Pre-procesed and cleaned data could be used for
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


