We propose a new approach to discover and extract topic-specific hypertext resources from the WWW. The method, called schema driven and topical crawling, allows a user to define schema and extracting rules for a specific domain of interests. It supports automatically search and extract schema-relevant web pages from the web. Different from common approaches that surf solely on web pages, our approach supports crawler to surf on a virtual network composed by concept instances and relationships. To achieve such a goal, we design an architecture that integrates several techniques including web extractor, meta-search engine and query expansion, and provide a toolkit to support it. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Guo, Q., Guo, H., Zhang, Z., Sun, J., & Feng, J. (2005). Schema driven and topic specific web crawling. In Lecture Notes in Computer Science (Vol. 3453, pp. 594–599). Springer Verlag. https://doi.org/10.1007/11408079_55
Mendeley helps you to discover research relevant for your work.