Two 1%s Don't make a whole: Comparing simultaneous samples from Twitter's Streaming API

Kenneth Joseph; Peter M. Landwehr; Kathleen M. Carley

Conference Proceedings

Two 1%s Don't make a whole: Comparing simultaneous samples from Twitter's Streaming API

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8393 LNCS 75-83

DOI: 10.1007/978-3-319-05579-4_10

42Citations

92Readers

Get full text

Abstract

We compare samples of tweets from the Twitter Streaming API constructed from different connections that tracked the same popular keywords at the same time. We find that on average, over 96% of the tweets seen in one sample are seen in all others. Those tweets found only in a subset of samples do not significantly differ from tweets found in all samples in terms of user popularity or tweet structure. We conclude they are likely the result of a technical artifact rather than any systematic bias. Practically, our results show that an infinite number of Streaming API samples are necessary to collect "most" of the tweets containing a popular keyword, and that findings from one sample from the Streaming API are likely to hold for all samples that could have been taken. Methodologically, our approach is extendible to other types of social media data beyond Twitter. © 2014 Springer International Publishing Switzerland.

Cite

CITATION STYLE

APA

Joseph, K., Landwehr, P. M., & Carley, K. M. (2014). Two 1%s Don’t make a whole: Comparing simultaneous samples from Twitter’s Streaming API. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8393 LNCS, pp. 75–83). Springer Verlag. https://doi.org/10.1007/978-3-319-05579-4_10

Two 1%s Don't make a whole: Comparing simultaneous samples from Twitter's Streaming API

Abstract

Cite

Register to see more suggestions