A scalable randomized method to compute link-based similarity rank on the web graph

Dániel Fogaras; Balázs Rácz

Journal Article

A scalable randomized method to compute link-based similarity rank on the web graph

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3268 557-567

DOI: 10.1007/978-3-540-30192-9_55

9Citations

5Readers

Get full text

Abstract

Several iterative hyperlink-based similarity measures were published to express the similarity of web pages. However, it usually seems hopeless to evaluate complex similarity functions over large repositories containing hundreds of millions of pages. We introduce scalable algorithms computing SimRank scores, which express the contextual similarities of pages based on the hyperlink structure. The proposed methods scale well to large repositories, fulfilling strict requirements about computational complexity. The algorithms were tested on a set of ten million pages, but parallelization techniques make it possible to compute the SimRank scores even for the entire web with over 4 billion pages. The key idea is that randomized Monte Carlo methods combined with indexing techniques yield a scalable approximation of SimRank. © Springer-Verlag 2004.

Cite

CITATION STYLE

APA

Fogaras, D., & Rácz, B. (2004). A scalable randomized method to compute link-based similarity rank on the web graph. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3268, 557–567. https://doi.org/10.1007/978-3-540-30192-9_55

A scalable randomized method to compute link-based similarity rank on the web graph

Abstract

Cite

Register to see more suggestions