On the Feasibility of Geographically Distributed Web Crawling

12Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We identify the issues that are important in design of a geographically distributed Web crawler. The identified issues are discussed from a “benefit” and “challenge” point of view. More specifically, we focus on the effect of geographical locality of Web sites on crawling performance, and, as a practical study, investigate the feasibility of a distributed crawler in terms of network costs. For this purpose, we conduct various experiments to collect network access statistics about the servers in the educational domains of eight different countries (USA, Canada, Chile, Brazil, Spain, Portugal, Turkey, and Greece). We gather the statistics from four different sites located in USA, Brazil, Spain, and Turkey using echoping. The results favor geographically distributed Web crawling in terms of crawling throughput.

Cite

CITATION STYLE

APA

Cambazoglu, B. B., Junqueira, F., Plachouras, V., & Telloli, L. (2008). On the Feasibility of Geographically Distributed Web Crawling. In ACM International Conference Proceeding Series (Vol. 2008-June). Association for Computing Machinery. https://doi.org/10.4108/ICST.INFOSCALE2008.3550

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free