PyData London 2014 Martin Goodson- Most A/B Testing Results are Illusory

•

2 j'aime•1,468 vues

PyData London 2014 Martin Goodson - Most A/B Testing Results are Illusory

Most A/B testing results
are Illusory
Martin Goodson, Skimlinks

These are my opinions not those of my
employer!

What’s an A/B test?
Example: Free delivery
A: Control
B: Variant

‘How can you talk for 40 minutes
about A/B testing?’

A/B tests are very easy to get wrong

What my experience is based on

What this talk is about
3 Statistical concepts
Errors and consequences
These errors are exactly how A/B testing
software works

What this talk is about
Statistical Power
Multiple Testing
Regression to the Mean

What is Statistical Power?
The probability that you will detect a true
difference between two samples

What is Statistical Power?
Example: are men taller than women, on
average?

What is Statistical Power?
Example: free delivery on a website

Why is Statistical Power important?
1. False negatives
2. False positives

Precision
Proportion of true positives in the positive
results
Its a function of power, significance level and
prevalence.

If you have good power?
Out of 100 tests
10 really drive uplift
You detect 8
5 false positives
8/13 of positive tests are real

If you have bad power?
Out of 100 tests
10 really drive uplift
You detect 3
5 false positives
3/8 of winning tests are real!

Marketer: ‘We need results in 2 weeks time’
Me: ‘We can’t run this test for only two weeks we won’t get robust results’

Marketer: ‘We need results in 2 weeks time’
Me: ‘We can’t run this test for only two weeks we won’t get robust results’
Marketer: ‘Why are you being so negative?’

Calculating Power
Alpha: probability of a positive result when
the null hypothesis is true (5%)
Beta: probability of not seeing a positive
result when the null hypothesis is true
Power = 1- Beta (80-90%)

Calculating Power
Use a power calculator:
Online
R (power.prop.test)
python (statsmodels.stats.power)

Approximate sample sizes
Using a power calculator and asking for 80%
power and significance level of 5%:
6000 conversions to detect 5% uplift
1600 conversions to detect 10% uplift

Multiple testing

Effect of multiple testing
if you run 20 tests at a significance level of 5%
you will obtain 1 win, just by chance.

Giving targets for successful tests.

Stopping tests early

Stopping tests early
Simulations show that stopping an A/A test
when you see a positive results will result in
successful test 41% of the time.

Stopping tests early
That works out to a precision of 20%

Negative uplift.
Stopping an A/B test with negative effect
results in a win 9% of the time!

A True Story

Regression to the mean
Give 100 students a true/false test
They all answer randomly
Take only the top scoring 10% of the class
Test them again
What will the results be?

Estimates of uplift are generally
wrong.

What you need to do to get it right
● Do a power calculation first to estimate
sample size
● Use a valid hypothesis - don’t use a
scattergun approach
● Do not stop the test early
● Perform a second ‘validation’ test

My details
martingoodson@gmail.com
@martingoodson
http://goo.gl/jvhwmB
Download my whitepaper on A/B testing here

Skimlinks After Party!
Levante Bar
5 minutes away
Come hungry!
Invites + Map at the booth
http://skimlinks.com/jobs

Recommandé

NipypePyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData

Unit testing data with marbles - Jane Stewart Adams, Leif Walsh

Unit testing data with marbles - Jane Stewart Adams, Leif Walsh

Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData

The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski

The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski

The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData

Using Embeddings to Understand the Variance and Evolution of Data Science... ...

Using Embeddings to Understand the Variance and Evolution of Data Science... ...

Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData

Deploying Data Science for Distribution of The New York Times - Anne Bauer

Deploying Data Science for Distribution of The New York Times - Anne Bauer

Deploying Data Science for Distribution of The New York Times - Anne BauerPyData

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData

Recommandé

NipypePyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData

Unit testing data with marbles - Jane Stewart Adams, Leif Walsh

Unit testing data with marbles - Jane Stewart Adams, Leif Walsh

Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData

The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski

The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski

The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData

Using Embeddings to Understand the Variance and Evolution of Data Science... ...

Using Embeddings to Understand the Variance and Evolution of Data Science... ...

Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData

Deploying Data Science for Distribution of The New York Times - Anne Bauer

Deploying Data Science for Distribution of The New York Times - Anne Bauer

Deploying Data Science for Distribution of The New York Times - Anne BauerPyData

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData

Words in Space - Rebecca Bilbro

Words in Space - Rebecca Bilbro

Words in Space - Rebecca BilbroPyData

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData

Pydata beautiful soup - Monica Puerto

Pydata beautiful soup - Monica Puerto

Pydata beautiful soup - Monica PuertoPyData

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData

Extending Pandas with Custom Types - Will Ayd

Extending Pandas with Custom Types - Will Ayd

Extending Pandas with Custom Types - Will AydPyData

Measuring Model Fairness - Stephen Hoover

Measuring Model Fairness - Stephen Hoover

Measuring Model Fairness - Stephen HooverPyData

What's the Science in Data Science? - Skipper Seabold

What's the Science in Data Science? - Skipper Seabold

What's the Science in Data Science? - Skipper SeaboldPyData

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData

Deprecating the state machine: building conversational AI with the Rasa stack...

Deprecating the state machine: building conversational AI with the Rasa stack...

Deprecating the state machine: building conversational AI with the Rasa stack...PyData

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Towards automating machine learning: benchmarking tools for hyperparameter tu...PyData

Using GANs to improve generalization in a semi-supervised setting - trying it...

Using GANs to improve generalization in a semi-supervised setting - trying it...

Using GANs to improve generalization in a semi-supervised setting - trying it...PyData

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...PyData

Extracting relevant Metrics with Spectral Clustering - Evelyn Trautmann

Extracting relevant Metrics with Spectral Clustering - Evelyn Trautmann

Extracting relevant Metrics with Spectral Clustering - Evelyn TrautmannPyData

GDPR in practise - Developing models with transparency and privacy in mind - ...

GDPR in practise - Developing models with transparency and privacy in mind - ...

GDPR in practise - Developing models with transparency and privacy in mind - ...PyData

From cells to drug responses - machine learning in cancer research - Julian d...

From cells to drug responses - machine learning in cancer research - Julian d...

From cells to drug responses - machine learning in cancer research - Julian d...PyData

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Contenu connexe

Plus de PyData

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData

Words in Space - Rebecca Bilbro

Words in Space - Rebecca Bilbro

Words in Space - Rebecca BilbroPyData

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData

Pydata beautiful soup - Monica Puerto

Pydata beautiful soup - Monica Puerto

Pydata beautiful soup - Monica PuertoPyData

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData

Extending Pandas with Custom Types - Will Ayd

Extending Pandas with Custom Types - Will Ayd

Extending Pandas with Custom Types - Will AydPyData

Measuring Model Fairness - Stephen Hoover

Measuring Model Fairness - Stephen Hoover

Measuring Model Fairness - Stephen HooverPyData

What's the Science in Data Science? - Skipper Seabold

What's the Science in Data Science? - Skipper Seabold

What's the Science in Data Science? - Skipper SeaboldPyData

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData

Deprecating the state machine: building conversational AI with the Rasa stack...

Deprecating the state machine: building conversational AI with the Rasa stack...

Deprecating the state machine: building conversational AI with the Rasa stack...PyData

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Towards automating machine learning: benchmarking tools for hyperparameter tu...PyData

Using GANs to improve generalization in a semi-supervised setting - trying it...

Using GANs to improve generalization in a semi-supervised setting - trying it...

Using GANs to improve generalization in a semi-supervised setting - trying it...PyData

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...PyData

Extracting relevant Metrics with Spectral Clustering - Evelyn Trautmann

Extracting relevant Metrics with Spectral Clustering - Evelyn Trautmann

Extracting relevant Metrics with Spectral Clustering - Evelyn TrautmannPyData

GDPR in practise - Developing models with transparency and privacy in mind - ...

GDPR in practise - Developing models with transparency and privacy in mind - ...

GDPR in practise - Developing models with transparency and privacy in mind - ...PyData

From cells to drug responses - machine learning in cancer research - Julian d...

From cells to drug responses - machine learning in cancer research - Julian d...

From cells to drug responses - machine learning in cancer research - Julian d...PyData

Plus de PyData (20)

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Words in Space - Rebecca Bilbro

Words in Space - Rebecca Bilbro

Words in Space - Rebecca Bilbro

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

Pydata beautiful soup - Monica Puerto

Pydata beautiful soup - Monica Puerto

Pydata beautiful soup - Monica Puerto

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

Extending Pandas with Custom Types - Will Ayd

Extending Pandas with Custom Types - Will Ayd

Extending Pandas with Custom Types - Will Ayd

Measuring Model Fairness - Stephen Hoover

Measuring Model Fairness - Stephen Hoover

Measuring Model Fairness - Stephen Hoover

What's the Science in Data Science? - Skipper Seabold

What's the Science in Data Science? - Skipper Seabold

What's the Science in Data Science? - Skipper Seabold

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

Deprecating the state machine: building conversational AI with the Rasa stack...

Deprecating the state machine: building conversational AI with the Rasa stack...

Deprecating the state machine: building conversational AI with the Rasa stack...

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Using GANs to improve generalization in a semi-supervised setting - trying it...

Using GANs to improve generalization in a semi-supervised setting - trying it...

Using GANs to improve generalization in a semi-supervised setting - trying it...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

Extracting relevant Metrics with Spectral Clustering - Evelyn Trautmann

Extracting relevant Metrics with Spectral Clustering - Evelyn Trautmann

Extracting relevant Metrics with Spectral Clustering - Evelyn Trautmann

GDPR in practise - Developing models with transparency and privacy in mind - ...

GDPR in practise - Developing models with transparency and privacy in mind - ...

GDPR in practise - Developing models with transparency and privacy in mind - ...

From cells to drug responses - machine learning in cancer research - Julian d...

From cells to drug responses - machine learning in cancer research - Julian d...

From cells to drug responses - machine learning in cancer research - Julian d...

Dernier

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

A Call to Action for Generative AI in 2024

A Call to Action for Generative AI in 2024

A Call to Action for Generative AI in 2024Results

Histor y of HAM Radio presentation slide

Histor y of HAM Radio presentation slide

Histor y of HAM Radio presentation slidevu2urc

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Presentation on how to chat with PDF using ChatGPT code interpreter

Presentation on how to chat with PDF using ChatGPT code interpreter

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

The 7 Things I Know About Cyber Security After 25 Years | April 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

[2024]Digital Global Overview Report 2024 Meltwater.pdf

[2024]Digital Global Overview Report 2024 Meltwater.pdf

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Handwritten Text Recognition for manuscripts and early printed texts

Handwritten Text Recognition for manuscripts and early printed texts

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

SQL Database Design For Developers at php[tek] 2024

SQL Database Design For Developers at php[tek] 2024

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

CNv6 Instructor Chapter 6 Quality of Service

CNv6 Instructor Chapter 6 Quality of Service

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Slack Application Development 101 Slides

Slack Application Development 101 Slides

Slack Application Development 101 Slidespraypatel2

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen Frames

A Call to Action for Generative AI in 2024

A Call to Action for Generative AI in 2024

A Call to Action for Generative AI in 2024

Histor y of HAM Radio presentation slide

Histor y of HAM Radio presentation slide

Histor y of HAM Radio presentation slide

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Presentation on how to chat with PDF using ChatGPT code interpreter

Presentation on how to chat with PDF using ChatGPT code interpreter

Presentation on how to chat with PDF using ChatGPT code interpreter

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

The 7 Things I Know About Cyber Security After 25 Years | April 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 Presentation

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

[2024]Digital Global Overview Report 2024 Meltwater.pdf

[2024]Digital Global Overview Report 2024 Meltwater.pdf

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Handwritten Text Recognition for manuscripts and early printed texts

Handwritten Text Recognition for manuscripts and early printed texts

Handwritten Text Recognition for manuscripts and early printed texts

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

SQL Database Design For Developers at php[tek] 2024

SQL Database Design For Developers at php[tek] 2024

SQL Database Design For Developers at php[tek] 2024

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

CNv6 Instructor Chapter 6 Quality of Service

CNv6 Instructor Chapter 6 Quality of Service

CNv6 Instructor Chapter 6 Quality of Service

Slack Application Development 101 Slides

Slack Application Development 101 Slides

Slack Application Development 101 Slides

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivity

PyData London 2014 Martin Goodson- Most A/B Testing Results are Illusory

1. Most A/B testing results are Illusory Martin Goodson, Skimlinks

2. These are my opinions not those of my employer!

3. What’s an A/B test? Example: Free delivery A: Control B: Variant

4. ‘How can you talk for 40 minutes about A/B testing?’

5. A/B tests are very easy to get wrong

6. What my experience is based on

7.

8. What this talk is about 3 Statistical concepts Errors and consequences These errors are exactly how A/B testing software works

9. What this talk is about Statistical Power Multiple Testing Regression to the Mean

10. What is Statistical Power? The probability that you will detect a true difference between two samples

11. What is Statistical Power? Example: are men taller than women, on average?

12. What is Statistical Power? Example: free delivery on a website

13. Why is Statistical Power important? 1. False negatives 2. False positives

14. Precision Proportion of true positives in the positive results Its a function of power, significance level and prevalence.

15. If you have good power? Out of 100 tests 10 really drive uplift You detect 8 5 false positives 8/13 of positive tests are real

16. If you have bad power? Out of 100 tests 10 really drive uplift You detect 3 5 false positives 3/8 of winning tests are real!

17. Marketer: ‘We need results in 2 weeks time’ Me: ‘We can’t run this test for only two weeks we won’t get robust results’

18. Marketer: ‘We need results in 2 weeks time’ Me: ‘We can’t run this test for only two weeks we won’t get robust results’ Marketer: ‘Why are you being so negative?’

19. Calculating Power Alpha: probability of a positive result when the null hypothesis is true (5%) Beta: probability of not seeing a positive result when the null hypothesis is true Power = 1- Beta (80-90%)

20. Calculating Power Use a power calculator: Online R (power.prop.test) python (statsmodels.stats.power)

21. Approximate sample sizes Using a power calculator and asking for 80% power and significance level of 5%: 6000 conversions to detect 5% uplift 1600 conversions to detect 10% uplift

22.

23.

24.

25.

26. Multiple testing

27. Effect of multiple testing if you run 20 tests at a significance level of 5% you will obtain 1 win, just by chance.

28.

29. Giving targets for successful tests.

30.

31. Stopping tests early

32. Stopping tests early Simulations show that stopping an A/A test when you see a positive results will result in successful test 41% of the time.

33. Stopping tests early That works out to a precision of 20%

34.

35. Negative uplift. Stopping an A/B test with negative effect results in a win 9% of the time!

36. A True Story

37. Regression to the mean Give 100 students a true/false test They all answer randomly Take only the top scoring 10% of the class Test them again What will the results be?

38.

39.

40.

41. Estimates of uplift are generally wrong.

42. What you need to do to get it right ● Do a power calculation first to estimate sample size ● Use a valid hypothesis - don’t use a scattergun approach ● Do not stop the test early ● Perform a second ‘validation’ test

43. My details martingoodson@gmail.com @martingoodson http://goo.gl/jvhwmB Download my whitepaper on A/B testing here

44. Skimlinks After Party! Levante Bar 5 minutes away Come hungry! Invites + Map at the booth http://skimlinks.com/jobs