Sign up & Download
Sign in

Social Streams Blog Crawler

by Matthew Hurst, Alexey Maykov
2009 IEEE 25th International Conference on Data Engineering (2009)

Abstract

Weblogs, and other forms of social media, differ from traditional Web content in many ways. One of the most important differences is the highly temporal nature of the content. Applications that leverage social media content must, to be effective, have access to this data with minimal publication/acquisition latency. An effective Weblog crawler should satisfy the following requirements: low latency, highly scalable, high data quality and appropriate network politeness. In this paper, we outline the Weblog crawler implemented in the social streams project and summarize the challenges faced during development.

Cite this document (BETA)

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

10 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
40% Ph.D. Student
 
10% Librarian
 
10% Student (Master)
by Country
 
20% China
 
20% Ghana
 
10% United Kingdom