Analyzing the Web: Are Top Websites Lists a Good Choice for Research?

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The web has been a subject of research since its beginning, but it is difficult if not impossible to analyze the whole web, even if a database of all URLs would be freely accessible. Hundreds of studies have used commercial top websites lists as a shortcut, in particular the Alexa One Million Top Sites list. However, apart from the fact that Amazon decided to terminate Alexa, we question the usefulness of such lists for research as they have several shortcomings. Our analysis shows that top sites lists miss frequently visited websites and offer only little value for language-specific research. We present a heuristic-driven alternative based on the Common Crawl host-level web graph while also taking language-specific requirements into account.

Cite

CITATION STYLE

APA

Alby, T., & Jäschke, R. (2022). Analyzing the Web: Are Top Websites Lists a Good Choice for Research? In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13541 LNCS, pp. 11–25). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16802-4_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free