Schema driven and topic specific web crawling

5Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a new approach to discover and extract topic-specific hypertext resources from the WWW. The method, called schema driven and topical crawling, allows a user to define schema and extracting rules for a specific domain of interests. It supports automatically search and extract schema-relevant web pages from the web. Different from common approaches that surf solely on web pages, our approach supports crawler to surf on a virtual network composed by concept instances and relationships. To achieve such a goal, we design an architecture that integrates several techniques including web extractor, meta-search engine and query expansion, and provide a toolkit to support it. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Guo, Q., Guo, H., Zhang, Z., Sun, J., & Feng, J. (2005). Schema driven and topic specific web crawling. In Lecture Notes in Computer Science (Vol. 3453, pp. 594–599). Springer Verlag. https://doi.org/10.1007/11408079_55

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free