Geotagging one hundred million Twitter accounts with total variation minimization

  • Mislove A
  • Lehmann S
  • Ahn Y
 et al. 
  • 1

    Readers

    Mendeley users who have this article in their library.
  • N/A

    Citations

    Citations of this article.

Abstract

Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publicly-visible Twitter data. Our method infers an unknown user's location by examining their friend's locations. We frame the geotagging problem as an optimization over a social network with a total variation-based objective and provide a scalable and distributed algorithm for its solution. Furthermore, we show how a robust estimate of the geographic dispersion of each user's ego network can be used as a per-user accuracy measure which is effective at removing outlying errors. Leave-many-out evaluation shows that our method is able to infer location for 101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag over 80\% of public tweets.

Author-supplied keywords

  • Age
  • Alcohol consumption
  • And government regulation
  • BRUMMITT
  • Back-link function
  • Benchmark
  • Big data
  • Bobby Duffy and Kate Smith (MORI Online)
  • Broken windows
  • CHARTS
  • CURRICULA (Courses of study)
  • Cannabis
  • Chicago
  • Coastal and marine
  • Collective mood
  • Collective sensing
  • Communication Privacy Management (CPM) Theory
  • Community detection
  • Comparing data from online and face-to-face survey
  • Computational criminology
  • Consistency
  • Context-specific concerns for information privacy
  • Control agency
  • Coverage error
  • Coverage model
  • Covert personalization
  • Cultural ecosystem services
  • Data mining
  • Digital divide
  • Drunk-driving crashes
  • ENGLISH language -- Study & teaching
  • Ecosystem services
  • Ecosystems
  • Enumeration error
  • Environmental impacts
  • Estimated-control poststratification
  • GIS
  • Gasoline prices
  • Gender
  • Geo-located Twitter
  • Geodemographics
  • Geodemography
  • George Terhanian and John Bremer (Harris Interacti
  • Global mobility patterns
  • Harris Interactive
  • Human mobility
  • INTERNATIONAL schools
  • Immigration
  • Imputation
  • Individual self-protection
  • Industry self-regulation
  • Information Privacy Concerns
  • Information privacy
  • Information systems
  • Institutional Privacy Assurance
  • Internet access
  • Land developability
  • Land usage
  • Legal status
  • Location-aware marketing (LAM)
  • Marijuana concentrates
  • Marijuana legalization
  • Mexican foreign-born
  • Mississippi
  • Mobility pattern
  • Name analysis
  • Natural resources
  • Neighborhood Characteristics R230
  • Nick
  • Non-material values
  • Open-source communications
  • Optimization
  • Overt personalization
  • Personalization privacy paradox
  • Phone survey
  • Population
  • Population growth
  • Population stress
  • Population-land conflicts
  • Poster Papers
  • Poststratification
  • Predictive policing
  • Privacy
  • Privacy decision making
  • Privacy-Enhancing Tools (PETs)
  • Privacy-by-Design (PbD)
  • Propensity weights
  • Psychological control
  • Quasi-randomization
  • Questionnaire Surveys
  • RESEARCH
  • RTI International
  • Race
  • Recreational Aspects of Natural Resources Q260
  • Regional Labor Markets
  • Regional development
  • Response model
  • Richardson plot
  • Rural
  • Sampling frame coverage bias
  • Self-selection
  • Sentiment tracking
  • Simulation
  • Social and Information Networks
  • Social media
  • Social networks
  • Stock market
  • Survey research
  • Survey-estimated control totals
  • Sustainable development
  • Switzerland
  • Traffic crashes
  • Traffic safety
  • Transportation management
  • Twitter
  • UNITED States
  • Unauthorized
  • Urban
  • Value Sensitive Design (VSD)
  • Vulnerability
  • Web survey
  • Weighting
  • and Regional Economics: Regional Mig
  • and Regional Economics: Regional Migration
  • big data
  • big data workforce
  • city-size distributions
  • community structure
  • conference presentation
  • data science
  • dependence
  • diagrams
  • disclosure
  • etc
  • fractal dimension
  • geo-temporal demographics
  • grey divide
  • growth
  • hashtag
  • head/tail breaks
  • head/tail division rule
  • journal paper
  • lifestyle
  • media use
  • migration
  • mismatch
  • mobility
  • mobility demographic research
  • neighborhood built environment
  • nested rank-size plots
  • nighttime imagery
  • pedestrian behavior
  • place preferences
  • propensity scoring
  • public infrastructure
  • refugee
  • regression-models
  • residential preferences
  • rural/small-town
  • s paradox
  • scaling of geographic space
  • selection bias
  • seniors
  • simpson
  • size
  • social capital
  • social context
  • social network
  • social networking
  • spatial interaction
  • spatial network
  • trajectory
  • tweets
  • twitter
  • united-states
  • urban boundary
  • urban geography
  • user behavior
  • v-analytics
  • variables
  • web survey
  • webographic variables

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • Alan Mislove

  • Sune Lehmann

  • Yong-yeol Ahn

  • Jukka-pekka Onnela

  • J Niels Rosenquist

  • David Jurgens

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free