Toppling Top Lists: Evaluating the Accuracy of Popular Website Lists

Kimberly Ruth; Deepak Kumar; Brandon Wang; Luke Valenta; Zakir Durumeric

Conference ProceedingsOPEN ACCESS

Toppling Top Lists: Evaluating the Accuracy of Popular Website Lists

Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC (2022) 374-387

DOI: 10.1145/3517745.3561444

34Citations

16Readers

Get full text

Abstract

Researchers rely on lists of popular websites like the Alexa Top Million both to measure the web and to evaluate proposed protocols and systems. Prior work has questioned the correctness and consistency of these lists, but without ground truth data to compare against, there has been no direct evaluation of list accuracy. In this paper, we evaluate the relative accuracy of the most popular top lists of websites. We derive a set of popularity metrics from server-side requests seen at Cloudflare, which authoritatively serves a significant portion of the most popular websites. We evaluate top lists against these metrics and show that most lists capture web popularity poorly, with the exception of the Chrome User Experience Report (CrUX) dataset, which is the most accurate top list compared to Cloudflare across all metrics. We explore the biases that lower the accuracy of other lists, and we conclude with recommendations for researchers studying the web in the future.

Cite

CITATION STYLE

APA

Ruth, K., Kumar, D., Wang, B., Valenta, L., & Durumeric, Z. (2022). Toppling Top Lists: Evaluating the Accuracy of Popular Website Lists. In Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC (pp. 374–387). Association for Computing Machinery. https://doi.org/10.1145/3517745.3561444

Toppling Top Lists: Evaluating the Accuracy of Popular Website Lists

Abstract

Cite

Register to see more suggestions