Schema driven and topic specific web crawling

Qi Guo; Hang Guo; Zhiqiang Zhang; Jing Sun; Jianhua Feng

Conference Proceedings

Schema driven and topic specific web crawling

Lecture Notes in Computer Science (2005) 3453 594-599

DOI: 10.1007/11408079_55

5Citations

3Readers

Get full text

Abstract

We propose a new approach to discover and extract topic-specific hypertext resources from the WWW. The method, called schema driven and topical crawling, allows a user to define schema and extracting rules for a specific domain of interests. It supports automatically search and extract schema-relevant web pages from the web. Different from common approaches that surf solely on web pages, our approach supports crawler to surf on a virtual network composed by concept instances and relationships. To achieve such a goal, we design an architecture that integrates several techniques including web extractor, meta-search engine and query expansion, and provide a toolkit to support it. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Guo, Q., Guo, H., Zhang, Z., Sun, J., & Feng, J. (2005). Schema driven and topic specific web crawling. In Lecture Notes in Computer Science (Vol. 3453, pp. 594–599). Springer Verlag. https://doi.org/10.1007/11408079_55

Schema driven and topic specific web crawling

Abstract

Cite

Register to see more suggestions