With the web at close to a billion pages and growing at anexponential rate, we are faced with the issue of rating pages in termsof quality and trust. In this situation, what other pages say about aweb page can be as important as what the page says about itself. The cumulative knowledge of these types of recommendations (or the lack there of) can be objective enough to help a user or robot program todecide whether or not to pursue a web document. In addition, these annotations or metadata can be used by a web robot program to derivesummary information about web documents that are written in a language that the robot does not understand. We use this idea to drive a web information gathering system that forms the core of a topic-speciffcsearch engine.In this paper, we describe how our system uses metadata about the hyperlinks to guide itself to crawl the web. It sifts through useful information related to a particular topic to eliminate the traversal of links thatmay not be of interest. Thus, the guided crawling system stays focusedon the target topic. It builds a rich repository of link information that includes metadata. This repository ultimately serves a search engine.
CITATION STYLE
Yi, J., Sundaresan, N., & Huang, A. (2001). Using metadata to enhance web information gathering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1997, pp. 38–57). Springer Verlag. https://doi.org/10.1007/3-540-45271-0_3
Mendeley helps you to discover research relevant for your work.