A conceptual framework for efficient Web crawling in virtual integration contexts

Inma Hernández; Hassan A. Sleiman; David Ruiz; Rafael Corchuelo

Conference Proceedings

A conceptual framework for efficient Web crawling in virtual integration contexts

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6988 LNCS(PART 2) 282-291

DOI: 10.1007/978-3-642-23982-3_35

N/ACitations

8Readers

Get full text

Abstract

Virtual Integration systems require a crawling tool able to navigate and reach relevant pages in the Web in an efficient way. Existing proposals in the crawling area are aware of the efficiency problem, but still most of them need to download pages in order to classify them as relevant or not. In this paper, we present a conceptual framework for designing crawlers supported by a web page classifier that relies solely on URLs to determine page relevance. Such a crawler is able to choose in each step only the URLs that lead to relevant pages, and therefore reduces the number of unnecessary pages downloaded, optimising bandwidth and making it efficient and suitable for virtual integration systems. Our preliminary experiments show that such a classifier is able to distinguish between links leading to different kinds of pages, without previous intervention from the user. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Hernández, I., Sleiman, H. A., Ruiz, D., & Corchuelo, R. (2011). A conceptual framework for efficient Web crawling in virtual integration contexts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6988 LNCS, pp. 282–291). https://doi.org/10.1007/978-3-642-23982-3_35

A conceptual framework for efficient Web crawling in virtual integration contexts

Abstract

Author supplied keywords

Cite

Register to see more suggestions