High-Performance Web Crawling

  • Najork M
  • Heydon A
N/ACitations
Citations of this article
101Readers
Mendeley users who have this article in their library.
Get full text

Abstract

High-performance web crawlers are an important component of many web services. For example, search services use web crawlers to populate their indices, comparison shopping engines use them to collect product and pricing information from online vendors, and the Internet Archive uses them to record a history of the Internet. The design of a high-performance crawler poses many challenges, both technical and social, primarily due to the large scale of the web. The web crawler must be able to download pages at a very high rate, yet it must not overwhelm any particular web server. Moreover, it must maintain data structures far too large to fit in main memory, yet it must be able to access and update them efficiently. This chapter describes our experience building and operating such a high-performance crawler.

Cite

CITATION STYLE

APA

Najork, M., & Heydon, A. (2002). High-Performance Web Crawling (pp. 25–45). https://doi.org/10.1007/978-1-4615-0005-6_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free