Liverank: How to refresh old crawls

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper considers the problem of refreshing a crawl. More precisely, given a collection of Web pages (with hyperlinks) gathered at some time, we want to identify a significant fraction of these pages that still exist at present time. Liveness of an old page can be tested through an online query at present time. We call LiveRank a ranking of the old pages that tries to give good rankings to active nodes. The quality of a LiveRank is measured by the number of queries necessary to identify a given fraction of the alive pages when using the LiveRank order. We study different scenarios from a static setting where the LiveRank is computed before any query is made, to dynamic settings where the LiveRank can be updated as queries are processed. Our results show that building on the PageRank can lead to efficient LiveRanks for Web graphs.

Cite

CITATION STYLE

APA

Huynh, T. D., Mathieu, F., & Viennot, L. (2014). Liverank: How to refresh old crawls. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8882, 148–160. https://doi.org/10.1007/978-3-319-13123-8_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free