Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence

  • Lecuyer M
  • Spahn R
  • Spiliopolous Y
 et al. 
  • 53


    Mendeley users who have this article in their library.
  • 8


    Citations of this article.


We present Sunlight, a system that detects the causes of target-ing phenomena on the web – such as personalized advertisements, recommendations, or content – at large scale and with solid statisti-cal confidence. Today's web is growing increasingly complex and impenetrable as myriad of services collect, analyze, use, and ex-change users' personal information. No one can tell who has what data, for what purposes they are using it, and how those uses affect the users. The few studies that exist reveal problematic effects – such as discriminatory pricing and advertising – but they are either too small-scale to generalize or lack formal assessments of confi-dence in the results, making them difficult to trust or interpret. Sunlight brings a principled and scalable methodology to per-sonal data measurements by adapting well-established methods from statistics for the specific problem of targeting detection. Our method-ology formally separates different operations into four key phases: scalable hypothesis generation, interpretable hypothesis formation, statistical significance testing, and multiple testing correction. Each phase bears instantiations from multiple mechanisms from statis-tics, each making different assumptions and tradeoffs. Sunlight of-fers a modular design that allows exploration of this vast design space. We explore a portion of this space, thoroughly evaluating the tradeoffs both analytically and experimentally. Our exploration reveals subtle tensions between scalability and confidence. Sun-light's default functioning strikes a balance to provide the first sys-tem that can diagnose targeting at fine granularity, at scale, and with solid statistical justification of its results. We showcase our system by running two measurement studies of targeting on the web, both the largest of their kind. Our studies – about ad targeting in Gmail and on the web – reveal statistically jus-tifiable evidence that contradicts two Google statements regarding the lack of targeting on sensitive and prohibited topics.

Author-supplied keywords

  • all or part of
  • classroom use is granted
  • copies are not made
  • measurement
  • or
  • or distributed
  • or hard copies of
  • permission to make digital
  • privacy
  • this work for personal
  • web transparency
  • without fee provided that

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Mathias Lecuyer

  • Riley Spahn

  • Yannis Spiliopolous

  • Augustin Chaintreau

  • Roxana Geambasu

  • Daniel Hsu

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free