Apophanies or Epiphanies? How Crawlers Impact Our Understanding of the Web

26Citations
Citations of this article
31Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data generated by web crawlers has formed the basis for much of our current understanding of the Internet. However, not all crawlers are created equal and crawlers generally find themselves trading off between computational overhead, developer effort, data accuracy, and completeness. Therefore, the choice of crawler has a critical impact on the data generated and knowledge inferred from it. In this paper, we conduct a systematic study of the trade-offs presented by different crawlers and the impact that these can have on various types of measurement studies. We make the following contributions: First, we conduct a survey of all research published since 2015 in the premier security and Internet measurement venues to identify and verify the repeatability of crawling methodologies deployed for different problem domains and publication venues. Next, we conduct a qualitative evaluation of a subset of all crawling tools identified in our survey. This evaluation allows us to draw conclusions about the suitability of each tool for specific types of data gathering. Finally, we present a methodology and a measurement framework to empirically highlight the differences between crawlers and how the choice of crawler can impact our understanding of the web.

Cite

CITATION STYLE

APA

Ahmad, S. S., Dar, M. D., Zaffar, M. F., Vallina-Rodriguez, N., & Nithyanand, R. (2020). Apophanies or Epiphanies? How Crawlers Impact Our Understanding of the Web. In The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 (pp. 271–280). Association for Computing Machinery, Inc. https://doi.org/10.1145/3366423.3380113

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free