Applying clickstream data mining to real-time Web crawler detection and containment using ClickTips platform

4Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Web crawler uncontrolled widespread has led to undesired situations of server overload and contents misuse. Most programs still have legitimate and useful goals, but standard detection heuristics have not evolved along with Web crawling technology and are now unable to identify most of today's programs. In this paper, we propose an integrated approach to the problem that ensures the generation of upto- date decision models, targeting both monitoring and clickstream differentiation. The ClickTips platform sustains Web crawler detection and containment mechanisms and its data webhousing system is responsible for clickstream processing and further data mining. Web crawler detection and monitoring helps preserving Web server performance and Web site privacy and clickstream differentiated analysis provides focused report and interpretation of navigational patterns. The generation of up-to-date detection models is based on clickstream data mining and targets not only well-known Web crawlers, but also camouflaging and previously unknown programs. Experiments with different real-world Web sites are optimistic, proving that the approach is not only feasible but also adequate.

Cite

CITATION STYLE

APA

Lourenço, A., & Belo, O. (2007). Applying clickstream data mining to real-time Web crawler detection and containment using ClickTips platform. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 351–358). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-540-70981-7_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free