SlideShare une entreprise Scribd logo
1  sur  52
Télécharger pour lire hors ligne
@gregdetre, gregdetre.co.uk
1-line AB tests in Django
23rd Feb, 2014
PyData, London
Greg Detre
@gregdetre
Sunday, 23 February 2014
i will show you how to write a 1-line AB test in Django. but itʼs only 1 line if you start sufficiently far to the left
@gregdetre, gregdetre.co.uk
INTRO
Sunday, 23 February 2014
GregDetre
Sunday, 23 February 2014
I'm Greg Detre
my PhD was on human memory & forgetting
Sunday, 23 February 2014
i spent my days scanning people’s brains
including my own
it turned out to be smaller than I’d hoped
Sunday, 23 February 2014
founded with Ed Cooke, grandmaster of memory, can remember a deck of cards in a
minute flat
set out to combine the art, and the science, of memory, to help people learn 10 times
faster
venture capital dance, millions of users
did a lot of AB testing, built our own internal framework
Sunday, 23 February 2014
helped build up their data science team
distil AB testing best practices for them
@gregdetre, gregdetre.co.uk
YOU
Sunday, 23 February 2014
Hands up if...
you’ve run an AB test
Sunday, 23 February 2014
Hands up if...
you’ve used Django
Sunday, 23 February 2014
WHAT IS AN
AB TEST?
Sunday, 23 February 2014
Sunday, 23 February 2014
When you release a change, you need to know whether you’ve
made a big step forward...
Or taken two steps back.
The idea behind AB testing is very simple:
- when you change something
- show some people the old version
- show some people the new version
- look at which group are happiest
i.e. it’s a scientific experiment on your product
Sunday, 23 February 2014
When you release a change, you need to know whether you’ve
made a big step forward...
Or taken two steps back.
The idea behind AB testing is very simple:
- when you change something
- show some people the old version
- show some people the new version
- look at which group are happiest
i.e. it’s a scientific experiment on your product
@gregdetre, gregdetre.co.uk
WHY RUN
AB TESTS?
Sunday, 23 February 2014
Sunday, 23 February 2014
AB testing for making decisions
Sunday, 23 February 2014
this has nothing to do with the talk
control for external factors
Sunday, 23 February 2014
If I’m a designer at The Guardian, and I change the font today.
Tomorrow, traffic increases by 50%.
Should I get a pay-rise?
Not if the paper just published the NSA leaks this afternoon.
By running old vs new simultaneously, you control for that surge
in traffic. Both groups will show the boost, but you’re just
looking at the difference between them.
improve your intuitions
Sunday, 23 February 2014
feedback loops, error-driven learning
PREFACE
Sunday, 23 February 2014
Sunday, 23 February 2014
yes, there are gotchas to AB testing
but the main problem in AB testing is that people don’t AB test often enough
CODE
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
I want to be able to do this
bucket = ab(user,
‘Expt 37 - red vs green buy button’,
[‘red’, ‘green’])
if bucket == ‘red’:
# show a red button
elif bucket == ‘green’:
# show a green button
else:
raise Exception(...)
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
Experiment model
class Experiment(Model):
name = CharField(max_length=100,
unique=True,
db_index=True)
cre = DateTimeField(default=timezone.now,
db_index=True)
users = ManyToManyField('auth.User',
through='ExperimentUser',
related_name='experiments')
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
ExperimentUser model
class ExperimentUser(Model):
user = ForeignKey('auth.User',
related_name='exptusers')
experiment = ForeignKey(Experiment,
related_name='exptusers')
bucket = CharField(max_length=100)
cre = DateTimeField(default=timezone.now,
editable=False)
class Meta:
unique_together = ('experiment', 'user',)
Sunday, 23 February 2014
minimize FKs and indexes on ExperimentUser
@gregdetre, gregdetre.co.uk
Putting a user in a bucket
def ab(user, name, buckets):
expt = Experiment.objects.get_or_create(name=name)[0]
exptuser, cre = ExperimentUser.objects.get_or_create(
experiment=expt, user=user)
if created:
exptuser.bucket = random.choice(buckets)
exptuser.save()
return exptuser.bucket
Sunday, 23 February 2014
probably should be using default= in ExperimentUser get_or_create
actually, why not ExperimentUser.objects.get_or_create(experiment__name=name)???
@gregdetre, gregdetre.co.uk
SQL for calculating retention
select
! d0.user,
! d0.dt as activity_date,
! 'd01'::text as retention_type,
! case when dXX.dt is not NULL then true else false end
as user_returned
from
! user_activity_per_day as d0
left join
! user_activity_per_day as dXX
on
! d0.user = dXX.user
! and
! d0.dt + 1 = dXX.dt
Sunday, 23 February 2014
Sunday, 23 February 2014
username visited
greg 20 Feb 2014
ed 20 Feb 2014
greg 21 Feb 2014
greg 22 Feb 2014
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
github.com/gregdetre/abracadjabra
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
PRO TIPS
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
do’s
Sunday, 23 February 2014
measure the right/high-level thing, so you can see if you're
making things worse elsewhere/down the line
e.g. eBay hurt their sale of books, but increased sale of cars
@gregdetre, gregdetre.co.uk
measure the right, high-level things ($, retention,
activation, sharing)
do’s
Sunday, 23 February 2014
measure the right/high-level thing, so you can see if you're
making things worse elsewhere/down the line
e.g. eBay hurt their sale of books, but increased sale of cars
@gregdetre, gregdetre.co.uk
measure the right, high-level things ($, retention,
activation, sharing)
run on a subset
do’s
Sunday, 23 February 2014
measure the right/high-level thing, so you can see if you're
making things worse elsewhere/down the line
e.g. eBay hurt their sale of books, but increased sale of cars
@gregdetre, gregdetre.co.uk
measure the right, high-level things ($, retention,
activation, sharing)
run on a subset
focus the analysis on relevant users
do’s
Sunday, 23 February 2014
measure the right/high-level thing, so you can see if you're
making things worse elsewhere/down the line
e.g. eBay hurt their sale of books, but increased sale of cars
@gregdetre, gregdetre.co.uk
measure the right, high-level things ($, retention,
activation, sharing)
run on a subset
focus the analysis on relevant users
make your prediction first
do’s
Sunday, 23 February 2014
measure the right/high-level thing, so you can see if you're
making things worse elsewhere/down the line
e.g. eBay hurt their sale of books, but increased sale of cars
@gregdetre, gregdetre.co.uk
measure the right, high-level things ($, retention,
activation, sharing)
run on a subset
focus the analysis on relevant users
make your prediction first
url for each expt (method, results)
do’s
Sunday, 23 February 2014
measure the right/high-level thing, so you can see if you're
making things worse elsewhere/down the line
e.g. eBay hurt their sale of books, but increased sale of cars
@gregdetre, gregdetre.co.uk
don’ts
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
don’ts
don’t get lost in the weeds
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
don’ts
don’t get lost in the weeds
don’t expect your AB tests to succeed very often
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
don’ts
don’t get lost in the weeds
don’t expect your AB tests to succeed very often
don’t keep checking the results
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
don’ts
don’t get lost in the weeds
don’t expect your AB tests to succeed very often
don’t keep checking the results
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
sanity checks
Sunday, 23 February 2014
e.g. if you make the site slower, how much does that hurt you?
prioritise dev efforts. or what if you get rid of components? or
get rid of ads?
@gregdetre, gregdetre.co.uk
sanity checks
AA test - should make no difference
Sunday, 23 February 2014
e.g. if you make the site slower, how much does that hurt you?
prioritise dev efforts. or what if you get rid of components? or
get rid of ads?
@gregdetre, gregdetre.co.uk
sanity checks
AA test - should make no difference
Sunday, 23 February 2014
e.g. if you make the site slower, how much does that hurt you?
prioritise dev efforts. or what if you get rid of components? or
get rid of ads?
@gregdetre, gregdetre.co.uk
sanity checks
AA test - should make no difference
does making things worse make things worse?
Sunday, 23 February 2014
e.g. if you make the site slower, how much does that hurt you?
prioritise dev efforts. or what if you get rid of components? or
get rid of ads?
@gregdetre, gregdetre.co.uk
software is the easy bit
Sunday, 23 February 2014
culture
human intuition to generate hypotheses vs being receptive to the results
most AB tests are null results
storing & sharing conclusions
the big changes are the most important to test, but the hardest
@gregdetre, gregdetre.co.uk
WORKING
TOGETHER
Sunday, 23 February 2014
software
science
startups
gregdetre.co.uk
@gregdetre
Sunday, 23 February 2014
i’m moving back to London
happy to help if you drop me a line. or you can hire me
@gregdetre, gregdetre.co.uk
THE END
Sunday, 23 February 2014
link to this
presentation
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
resources
Eric Ries,The one line split-test, or how to A/B all the time
http://www.startuplessonslearned.com/2008/09/one-line-split-test-or-how-
to-ab-all.html
Kohavi et al (2007), Practical Guide to Controlled Experiments on the
Web: Listen toYour Customers not to the HiPPO
http://exp-platform.com/Documents/GuideControlledExperiments.pdf
Kohavi et al (2013), Online Controlled Experiments at Large Scale, KDD.
http://www.exp-platform.com/Documents/
2013%20controlledExperimentsAtScale.pdf
Miller (2010), How not to run an AB test
http://www.evanmiller.org/how-not-to-run-an-ab-test.html
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
APPENDIX
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
no peeking
DO NOT: peek at your results daily, and stop
when you see an improvement
see Miller (2010)
Sunday, 23 February 2014
- say you start with a 50% conversion rate
- 2 buckets
- and you decide to stop when 5% significance or after 150
observations
- 26% chance of a false positive!
this is the worst case scenario (running a significance test after
every observation)
but peeking to see if there’s a difference and stopping when
there is inflates the chances of you seeing a spurious difference

Contenu connexe

Similaire à 1-Line AB Tests in Django by Greg Detre

Would you bet your job on your A/B test results?
Would you bet your job on your A/B test results?Would you bet your job on your A/B test results?
Would you bet your job on your A/B test results?Qubit
 
LUXr (Lean + UX)*Agile=awesome
LUXr (Lean + UX)*Agile=awesomeLUXr (Lean + UX)*Agile=awesome
LUXr (Lean + UX)*Agile=awesomeLUXr
 
2014 Experiments, Tokyo
2014 Experiments, Tokyo 2014 Experiments, Tokyo
2014 Experiments, Tokyo LUXr
 
Top 6 ways developers mess up on User Experience (and how to avoid them) [SF ...
Top 6 ways developers mess up on User Experience (and how to avoid them) [SF ...Top 6 ways developers mess up on User Experience (and how to avoid them) [SF ...
Top 6 ways developers mess up on User Experience (and how to avoid them) [SF ...Kate Rutter
 
How ESUP-Portail contributes to open source software for higher ed
How ESUP-Portail contributes to open source software for higher edHow ESUP-Portail contributes to open source software for higher ed
How ESUP-Portail contributes to open source software for higher edmatguerin
 
Big Data, Little Devices: Mobile A/B Testing
Big Data, Little Devices: Mobile A/B TestingBig Data, Little Devices: Mobile A/B Testing
Big Data, Little Devices: Mobile A/B TestingZac Aghion
 
Creativity exercises
Creativity exercisesCreativity exercises
Creativity exercisesFrank Calberg
 
Usability Testing How To's - EventHandler, London Oct 24th 2013
Usability Testing How To's - EventHandler, London Oct 24th 2013Usability Testing How To's - EventHandler, London Oct 24th 2013
Usability Testing How To's - EventHandler, London Oct 24th 2013Evgenia (Jenny) Grinblo
 
How to Correctly Use Experimentation in PM by Google PM
How to Correctly Use Experimentation in PM by Google PMHow to Correctly Use Experimentation in PM by Google PM
How to Correctly Use Experimentation in PM by Google PMProduct School
 
LUXr Downtown Las Vegas Small Business 1-day workshop, July 11, 2013 [Las Vegas]
LUXr Downtown Las Vegas Small Business 1-day workshop, July 11, 2013 [Las Vegas]LUXr Downtown Las Vegas Small Business 1-day workshop, July 11, 2013 [Las Vegas]
LUXr Downtown Las Vegas Small Business 1-day workshop, July 11, 2013 [Las Vegas]LUXr
 
Experimentation for PMs: A Primer by Amazon Director of Product
Experimentation for PMs: A Primer by Amazon Director of ProductExperimentation for PMs: A Primer by Amazon Director of Product
Experimentation for PMs: A Primer by Amazon Director of ProductProduct School
 
Crest awards project_Bee-o-diversity
Crest awards project_Bee-o-diversityCrest awards project_Bee-o-diversity
Crest awards project_Bee-o-diversityDaniel Tagg
 
Experiment to build the right thing
Experiment to build the right thingExperiment to build the right thing
Experiment to build the right thingAnders Toxboe
 
Even Naming This Talk Is Hard
Even Naming This Talk Is HardEven Naming This Talk Is Hard
Even Naming This Talk Is HardRuthie BenDor
 
Cognitive Shortcuts: Models, Visualizations, Metaphors, and Other Lies (Rails...
Cognitive Shortcuts: Models, Visualizations, Metaphors, and Other Lies (Rails...Cognitive Shortcuts: Models, Visualizations, Metaphors, and Other Lies (Rails...
Cognitive Shortcuts: Models, Visualizations, Metaphors, and Other Lies (Rails...Sam Livingston-Gray
 
Easy and affordable user testing - Front Trends 2017
Easy and affordable user testing - Front Trends 2017Easy and affordable user testing - Front Trends 2017
Easy and affordable user testing - Front Trends 2017Ida Aalen
 

Similaire à 1-Line AB Tests in Django by Greg Detre (20)

ScienceBehindUX
ScienceBehindUXScienceBehindUX
ScienceBehindUX
 
Would you bet your job on your A/B test results?
Would you bet your job on your A/B test results?Would you bet your job on your A/B test results?
Would you bet your job on your A/B test results?
 
LUXr (Lean + UX)*Agile=awesome
LUXr (Lean + UX)*Agile=awesomeLUXr (Lean + UX)*Agile=awesome
LUXr (Lean + UX)*Agile=awesome
 
2014 Experiments, Tokyo
2014 Experiments, Tokyo 2014 Experiments, Tokyo
2014 Experiments, Tokyo
 
Top 6 ways developers mess up on User Experience (and how to avoid them) [SF ...
Top 6 ways developers mess up on User Experience (and how to avoid them) [SF ...Top 6 ways developers mess up on User Experience (and how to avoid them) [SF ...
Top 6 ways developers mess up on User Experience (and how to avoid them) [SF ...
 
How ESUP-Portail contributes to open source software for higher ed
How ESUP-Portail contributes to open source software for higher edHow ESUP-Portail contributes to open source software for higher ed
How ESUP-Portail contributes to open source software for higher ed
 
Big Data, Little Devices: Mobile A/B Testing
Big Data, Little Devices: Mobile A/B TestingBig Data, Little Devices: Mobile A/B Testing
Big Data, Little Devices: Mobile A/B Testing
 
Creativity exercises
Creativity exercisesCreativity exercises
Creativity exercises
 
Usability Testing How To's - EventHandler, London Oct 24th 2013
Usability Testing How To's - EventHandler, London Oct 24th 2013Usability Testing How To's - EventHandler, London Oct 24th 2013
Usability Testing How To's - EventHandler, London Oct 24th 2013
 
How to Correctly Use Experimentation in PM by Google PM
How to Correctly Use Experimentation in PM by Google PMHow to Correctly Use Experimentation in PM by Google PM
How to Correctly Use Experimentation in PM by Google PM
 
LUXr Downtown Las Vegas Small Business 1-day workshop, July 11, 2013 [Las Vegas]
LUXr Downtown Las Vegas Small Business 1-day workshop, July 11, 2013 [Las Vegas]LUXr Downtown Las Vegas Small Business 1-day workshop, July 11, 2013 [Las Vegas]
LUXr Downtown Las Vegas Small Business 1-day workshop, July 11, 2013 [Las Vegas]
 
Teaching Drupal
Teaching DrupalTeaching Drupal
Teaching Drupal
 
Man&symbolspreso
Man&symbolspresoMan&symbolspreso
Man&symbolspreso
 
Experimentation for PMs: A Primer by Amazon Director of Product
Experimentation for PMs: A Primer by Amazon Director of ProductExperimentation for PMs: A Primer by Amazon Director of Product
Experimentation for PMs: A Primer by Amazon Director of Product
 
Crest awards project_Bee-o-diversity
Crest awards project_Bee-o-diversityCrest awards project_Bee-o-diversity
Crest awards project_Bee-o-diversity
 
Experiment to build the right thing
Experiment to build the right thingExperiment to build the right thing
Experiment to build the right thing
 
Even Naming This Talk Is Hard
Even Naming This Talk Is HardEven Naming This Talk Is Hard
Even Naming This Talk Is Hard
 
Cognitive Shortcuts: Models, Visualizations, Metaphors, and Other Lies (Rails...
Cognitive Shortcuts: Models, Visualizations, Metaphors, and Other Lies (Rails...Cognitive Shortcuts: Models, Visualizations, Metaphors, and Other Lies (Rails...
Cognitive Shortcuts: Models, Visualizations, Metaphors, and Other Lies (Rails...
 
E book
E bookE book
E book
 
Easy and affordable user testing - Front Trends 2017
Easy and affordable user testing - Front Trends 2017Easy and affordable user testing - Front Trends 2017
Easy and affordable user testing - Front Trends 2017
 

Plus de PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

Plus de PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Dernier

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

1-Line AB Tests in Django by Greg Detre

  • 1. @gregdetre, gregdetre.co.uk 1-line AB tests in Django 23rd Feb, 2014 PyData, London Greg Detre @gregdetre Sunday, 23 February 2014 i will show you how to write a 1-line AB test in Django. but itʼs only 1 line if you start sufficiently far to the left
  • 3. GregDetre Sunday, 23 February 2014 I'm Greg Detre my PhD was on human memory & forgetting
  • 4. Sunday, 23 February 2014 i spent my days scanning people’s brains including my own it turned out to be smaller than I’d hoped
  • 5. Sunday, 23 February 2014 founded with Ed Cooke, grandmaster of memory, can remember a deck of cards in a minute flat set out to combine the art, and the science, of memory, to help people learn 10 times faster venture capital dance, millions of users did a lot of AB testing, built our own internal framework
  • 6. Sunday, 23 February 2014 helped build up their data science team distil AB testing best practices for them
  • 8. Hands up if... you’ve run an AB test Sunday, 23 February 2014
  • 9. Hands up if... you’ve used Django Sunday, 23 February 2014
  • 10. WHAT IS AN AB TEST? Sunday, 23 February 2014
  • 11. Sunday, 23 February 2014 When you release a change, you need to know whether you’ve made a big step forward... Or taken two steps back. The idea behind AB testing is very simple: - when you change something - show some people the old version - show some people the new version - look at which group are happiest i.e. it’s a scientific experiment on your product
  • 12. Sunday, 23 February 2014 When you release a change, you need to know whether you’ve made a big step forward... Or taken two steps back. The idea behind AB testing is very simple: - when you change something - show some people the old version - show some people the new version - look at which group are happiest i.e. it’s a scientific experiment on your product
  • 13. @gregdetre, gregdetre.co.uk WHY RUN AB TESTS? Sunday, 23 February 2014
  • 14. Sunday, 23 February 2014 AB testing for making decisions
  • 15. Sunday, 23 February 2014 this has nothing to do with the talk
  • 16. control for external factors Sunday, 23 February 2014 If I’m a designer at The Guardian, and I change the font today. Tomorrow, traffic increases by 50%. Should I get a pay-rise? Not if the paper just published the NSA leaks this afternoon. By running old vs new simultaneously, you control for that surge in traffic. Both groups will show the boost, but you’re just looking at the difference between them.
  • 17. improve your intuitions Sunday, 23 February 2014 feedback loops, error-driven learning
  • 19. Sunday, 23 February 2014 yes, there are gotchas to AB testing but the main problem in AB testing is that people don’t AB test often enough
  • 21. @gregdetre, gregdetre.co.uk I want to be able to do this bucket = ab(user, ‘Expt 37 - red vs green buy button’, [‘red’, ‘green’]) if bucket == ‘red’: # show a red button elif bucket == ‘green’: # show a green button else: raise Exception(...) Sunday, 23 February 2014
  • 22. @gregdetre, gregdetre.co.uk Experiment model class Experiment(Model): name = CharField(max_length=100, unique=True, db_index=True) cre = DateTimeField(default=timezone.now, db_index=True) users = ManyToManyField('auth.User', through='ExperimentUser', related_name='experiments') Sunday, 23 February 2014
  • 23. @gregdetre, gregdetre.co.uk ExperimentUser model class ExperimentUser(Model): user = ForeignKey('auth.User', related_name='exptusers') experiment = ForeignKey(Experiment, related_name='exptusers') bucket = CharField(max_length=100) cre = DateTimeField(default=timezone.now, editable=False) class Meta: unique_together = ('experiment', 'user',) Sunday, 23 February 2014 minimize FKs and indexes on ExperimentUser
  • 24. @gregdetre, gregdetre.co.uk Putting a user in a bucket def ab(user, name, buckets): expt = Experiment.objects.get_or_create(name=name)[0] exptuser, cre = ExperimentUser.objects.get_or_create( experiment=expt, user=user) if created: exptuser.bucket = random.choice(buckets) exptuser.save() return exptuser.bucket Sunday, 23 February 2014 probably should be using default= in ExperimentUser get_or_create actually, why not ExperimentUser.objects.get_or_create(experiment__name=name)???
  • 25. @gregdetre, gregdetre.co.uk SQL for calculating retention select ! d0.user, ! d0.dt as activity_date, ! 'd01'::text as retention_type, ! case when dXX.dt is not NULL then true else false end as user_returned from ! user_activity_per_day as d0 left join ! user_activity_per_day as dXX on ! d0.user = dXX.user ! and ! d0.dt + 1 = dXX.dt Sunday, 23 February 2014
  • 27. username visited greg 20 Feb 2014 ed 20 Feb 2014 greg 21 Feb 2014 greg 22 Feb 2014 Sunday, 23 February 2014
  • 30. @gregdetre, gregdetre.co.uk do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  • 31. @gregdetre, gregdetre.co.uk measure the right, high-level things ($, retention, activation, sharing) do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  • 32. @gregdetre, gregdetre.co.uk measure the right, high-level things ($, retention, activation, sharing) run on a subset do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  • 33. @gregdetre, gregdetre.co.uk measure the right, high-level things ($, retention, activation, sharing) run on a subset focus the analysis on relevant users do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  • 34. @gregdetre, gregdetre.co.uk measure the right, high-level things ($, retention, activation, sharing) run on a subset focus the analysis on relevant users make your prediction first do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  • 35. @gregdetre, gregdetre.co.uk measure the right, high-level things ($, retention, activation, sharing) run on a subset focus the analysis on relevant users make your prediction first url for each expt (method, results) do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  • 37. @gregdetre, gregdetre.co.uk don’ts don’t get lost in the weeds Sunday, 23 February 2014
  • 38. @gregdetre, gregdetre.co.uk don’ts don’t get lost in the weeds don’t expect your AB tests to succeed very often Sunday, 23 February 2014
  • 39. @gregdetre, gregdetre.co.uk don’ts don’t get lost in the weeds don’t expect your AB tests to succeed very often don’t keep checking the results Sunday, 23 February 2014
  • 40. @gregdetre, gregdetre.co.uk don’ts don’t get lost in the weeds don’t expect your AB tests to succeed very often don’t keep checking the results Sunday, 23 February 2014
  • 41. @gregdetre, gregdetre.co.uk sanity checks Sunday, 23 February 2014 e.g. if you make the site slower, how much does that hurt you? prioritise dev efforts. or what if you get rid of components? or get rid of ads?
  • 42. @gregdetre, gregdetre.co.uk sanity checks AA test - should make no difference Sunday, 23 February 2014 e.g. if you make the site slower, how much does that hurt you? prioritise dev efforts. or what if you get rid of components? or get rid of ads?
  • 43. @gregdetre, gregdetre.co.uk sanity checks AA test - should make no difference Sunday, 23 February 2014 e.g. if you make the site slower, how much does that hurt you? prioritise dev efforts. or what if you get rid of components? or get rid of ads?
  • 44. @gregdetre, gregdetre.co.uk sanity checks AA test - should make no difference does making things worse make things worse? Sunday, 23 February 2014 e.g. if you make the site slower, how much does that hurt you? prioritise dev efforts. or what if you get rid of components? or get rid of ads?
  • 45. @gregdetre, gregdetre.co.uk software is the easy bit Sunday, 23 February 2014 culture human intuition to generate hypotheses vs being receptive to the results most AB tests are null results storing & sharing conclusions the big changes are the most important to test, but the hardest
  • 47. software science startups gregdetre.co.uk @gregdetre Sunday, 23 February 2014 i’m moving back to London happy to help if you drop me a line. or you can hire me
  • 50. @gregdetre, gregdetre.co.uk resources Eric Ries,The one line split-test, or how to A/B all the time http://www.startuplessonslearned.com/2008/09/one-line-split-test-or-how- to-ab-all.html Kohavi et al (2007), Practical Guide to Controlled Experiments on the Web: Listen toYour Customers not to the HiPPO http://exp-platform.com/Documents/GuideControlledExperiments.pdf Kohavi et al (2013), Online Controlled Experiments at Large Scale, KDD. http://www.exp-platform.com/Documents/ 2013%20controlledExperimentsAtScale.pdf Miller (2010), How not to run an AB test http://www.evanmiller.org/how-not-to-run-an-ab-test.html Sunday, 23 February 2014
  • 52. @gregdetre, gregdetre.co.uk no peeking DO NOT: peek at your results daily, and stop when you see an improvement see Miller (2010) Sunday, 23 February 2014 - say you start with a 50% conversion rate - 2 buckets - and you decide to stop when 5% significance or after 150 observations - 26% chance of a false positive! this is the worst case scenario (running a significance test after every observation) but peeking to see if there’s a difference and stopping when there is inflates the chances of you seeing a spurious difference