A framework for incremental deep Web crawler based on URL classification

6Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the Web grows rapidly, more and more data become available in the Deep Web • But users have to key in a set of keywords in order to access the pages from some web sites. Traditional search engines only index and retrieve Surface Web pages through static URL links, because Deep Web pages are hidden behind the forms. However, the amount of information contained in the Deep web is not only far more than the Surface Web, the information of Deep Web is more valuable than the Surface Web. As Deep Web Pages change rapidly, how to maintain the Deep Web pages which were crawled fresh and to crawl the new Deep Web pages is a challenge. A framework for incremental Deep Web crawler based on URL classification is proposed. According to the list page and leaf page, the URL that is related with the page can be divided into two parts: list URL and leaf URL. The framework not only crawls the latest Deep Web pages according to the change frequency of list page, but also crawl the leaf pages which often change. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Zhang, Z., Dong, G., Peng, Z., & Yan, Z. (2011). A framework for incremental deep Web crawler based on URL classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6988 LNCS, pp. 302–310). https://doi.org/10.1007/978-3-642-23982-3_37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free