Sign up & Download
Sign in

Controlled experiments on the web: survey and practical guide

by Ron Kohavi, Roger Longbotham, Dan Sommerfield, Randal M Henne
Data Mining and Knowledge Discovery (2008)

Abstract

The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where end-users can help guide the development of features. Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Persons Opinion (HiPPO). We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques, which we show are not as simple in practice as is often assumed. Controlled experiments typically generate large amounts of data, which can be analyzed using data mining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses. Based on our extensive practical experience with multiple systems and organizations, we share key lessons that will help practitioners in running trustworthy controlled experiments.

Cite this document (BETA)

Available from www.springerlink.com
Page 1
hidden

Controlled experiments on the web: survey and practical guide

Data Min Knowl Disc (2009) 18:140–181
DOI 10.1007/s10618-008-0114-1
Controlled experiments on the web:
survey and practical guide
Ron Kohavi · Roger Longbotham ·
Dan Sommerfield · Randal M. Henne
Received: 14 February 2008 / Accepted: 30 June 2008 / Published online: 30 July 2008
Springer Science+Business Media, LLC 2008
Abstract The web provides an unprecedented opportunity to evaluate ideas quickly
using controlled experiments, also called randomized experiments, A/B tests (and their
generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and
parallel flights. Controlled experiments embody the best scientific design for estab-
lishing a causal relationship between changes and their influence on user-observable
behavior. We provide a practical guide to conducting online experiments, where end-
users can help guide the development of features. Our experience indicates that sig-
nificant learning and return-on-investment (ROI) are seen when development teams
listen to their customers, not to the Highest Paid Person’s Opinion (HiPPO). We pro-
vide several examples of controlled experiments with surprising results. We review
the important ingredients of running controlled experiments, and discuss their limita-
tions (both technical and organizational). We focus on several areas that are critical
to experimentation, including statistical power, sample size, and techniques for vari-
ance reduction. We describe common architectures for experimentation systems and
analyze their advantages and disadvantages. We evaluate randomization and hashing
techniques, which we show are not as simple in practice as is often assumed. Controlled
Responsible editor: R. Bayardo.
R. Kohavi (B) · R. Longbotham · D. Sommerfield · R. M. Henne
Microsoft, One Microsoft Way, Redmond, WA 98052, USA
e-mail: ronnyk@microsoft.com
R. Longbotham
e-mail: rogerlon@microsoft.com
D. Sommerfield
e-mail: dans@microsoft.com
R. M. Henne
e-mail: rhenne@microsoft.com
123
Page 2
hidden
Controlled experiments on the web 141
experiments typically generate large amounts of data, which can be analyzed using
data mining techniques to gain deeper understanding of the factors influencing the out-
come of interest, leading to new hypotheses and creating a virtuous cycle of improve-
ments. Organizations that embrace controlled experiments with clear evaluation cri-
teria can evolve their systems with automated optimizations and real-time analyses.
Based on our extensive practical experience with multiple systems and organizations,
we share key lessons that will help practitioners in running trustworthy controlled
experiments.
Keywords Controlled experiments · A/B testing · e-commerce ·
Website optimization ·MultiVariable Testing ·MVT
1 Introduction
One accurate measurement is worth more than a thousand expert opinions
– Admiral Grace Hopper
In the 1700s, a British ship’s captain observed the lack of scurvy among sailors serv-
ing on the naval ships of Mediterranean countries, where citrus fruit was part of their
rations. He then gave half his crew limes (the Treatment group) while the other half (the
Control group) continued with their regular diet. Despite much grumbling among the
crew in the Treatment group, the experiment was a success, showing that consuming
limes prevented scurvy. While the captain did not realize that scurvy is a consequence
of vitamin C deficiency, and that limes are rich in vitamin C, the intervention worked.
British sailors eventually were compelled to consume citrus fruit regularly, a practice
that gave rise to the still-popular label limeys (Rossi et al. 2003; Marks 2000).
Some 300 years later, Greg Linden at Amazon created a prototype to show
personalized recommendations based on items in the shopping cart (Linden 2006a, b).
You add an item, recommendations show up; add another item, different recommenda-
tions show up. Linden notes that while the prototype looked promising, “a marketing
senior vice-president was dead set against it,” claiming it will distract people from
checking out. Greg was “forbidden to work on this any further.” Nonetheless, Greg
ran a controlled experiment, and the “feature won by such a wide margin that not
having it live was costing Amazon a noticeable chunk of change. With new urgency,
shopping cart recommendations launched.” Since then, multiple sites have copied cart
recommendations.
The authors of this paper were involved in many experiments at Amazon, Microsoft,
Dupont, and NASA. The culture of experimentation at Amazon, where data trumps
intuition (Kohavi et al. 2004), and a system that made running experiments easy,
allowed Amazon to innovate quickly and effectively. At Microsoft, there are multiple
systems for running controlled experiments. We describe several architectures in this
paper with their advantages and disadvantages. A unifying theme is that controlled
experiments have great return-on-investment (ROI) and that building the appropriate
infrastructure can accelerate innovation. Stefan Thomke’s book title is well suited
here: Experimentation Matters (Thomke 2003).
123

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

45 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
38% Ph.D. Student
 
11% Student (Master)
 
9% Assistant Professor
by Country
 
16% Germany
 
16% United States
 
7% Netherlands