Apophanies or Epiphanies? How Crawlers Impact Our Understanding of the Web

Syed Suleman Ahmad; Muhammad Daniyal Dar; Muhammad Fareed Zaffar; Narseo Vallina-Rodriguez; Rishab Nithyanand

Conference ProceedingsOPEN ACCESS

Apophanies or Epiphanies? How Crawlers Impact Our Understanding of the Web

The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 (2020) 271-280

DOI: 10.1145/3366423.3380113

26Citations

31Readers

Get full text

Abstract

Data generated by web crawlers has formed the basis for much of our current understanding of the Internet. However, not all crawlers are created equal and crawlers generally find themselves trading off between computational overhead, developer effort, data accuracy, and completeness. Therefore, the choice of crawler has a critical impact on the data generated and knowledge inferred from it. In this paper, we conduct a systematic study of the trade-offs presented by different crawlers and the impact that these can have on various types of measurement studies. We make the following contributions: First, we conduct a survey of all research published since 2015 in the premier security and Internet measurement venues to identify and verify the repeatability of crawling methodologies deployed for different problem domains and publication venues. Next, we conduct a qualitative evaluation of a subset of all crawling tools identified in our survey. This evaluation allows us to draw conclusions about the suitability of each tool for specific types of data gathering. Finally, we present a methodology and a measurement framework to empirically highlight the differences between crawlers and how the choice of crawler can impact our understanding of the web.

Cite

CITATION STYLE

APA

Ahmad, S. S., Dar, M. D., Zaffar, M. F., Vallina-Rodriguez, N., & Nithyanand, R. (2020). Apophanies or Epiphanies? How Crawlers Impact Our Understanding of the Web. In The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 (pp. 271–280). Association for Computing Machinery, Inc. https://doi.org/10.1145/3366423.3380113

Apophanies or Epiphanies? How Crawlers Impact Our Understanding of the Web

Abstract

Cite

Register to see more suggestions