IglooG: A distributed web crawler based on grid service

5Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Web crawler is program used to download documents from the web site. This paper presents the design of a distributed web crawler on grid platform. This distributed web crawler is based on our previous work Igloo. Each crawler is deployed as grid service to improve the scalability of the system. Information services in our system are in charge of distributing URLs to balance the loads of the crawlers and are deployed as grid service. Information services are organized as Peer-to-Peer overlay network. According to the ID of crawler and semantic vector of crawl page that is computed by Latent Semantic Indexing, crawler can decide whether transmits the URL to information service or hold itself. We present an implementation of the distributed crawler based on Igloo and simulate the environment of Grid to evaluate the balancing load on the crawlers and crawl speed. Both the theoretical analysis and the experimental results show that our system is a highperformance and reliable system. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Liu, F., Ma, F. Y., Ye, Y. M., Li, M. L., & Yu, J. D. (2005). IglooG: A distributed web crawler based on grid service. In Lecture Notes in Computer Science (Vol. 3399, pp. 207–216). Springer Verlag. https://doi.org/10.1007/978-3-540-31849-1_21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free