Parallel crawling for online social networks

84Citations
Citations of this article
90Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Given a huge online social network, how do we retrieve information from it through crawling? Even better, how do we improve the crawling performance by using parallel crawlers that work independently? In this paper, we present the framework of parallel crawlers for online social networks, utilizing a centralized queue. To show how this works in practice, we describe our implementation of the crawlers for an online auction website. The crawlers work independently, therefore the failing of one crawler does not affect the others at all. The framework ensures that no redundant crawling would occur. Using the crawlers that we built, we visited a total of approximately 11 million auction users, about 66,000 of which were completely crawled.

Cite

CITATION STYLE

APA

Chau, D. H., Pandit, S., Wang, S., & Faloutsos, C. (2007). Parallel crawling for online social networks. In 16th International World Wide Web Conference, WWW2007 (pp. 1283–1284). https://doi.org/10.1145/1242572.1242809

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free