SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Sample Size
The indispensable A/B test calculation
that you’re not making.

As Marketers,
many of us run A/B Tests

Version A is converting better than
Version B and statistical significance
has breached 95%.
So, Version A won.

Version A is converting better than
Version B and statistical significance
has breached 95%.
So, Version A won.
OR DID IT?

Suppose you check an A/B Test twice:
Once after 200 impressions and then after 500.
Then you end the test.

Now, instead, suppose you stop the test once
you reach significance:

Now, suppose you stop the experiment as soon
as there is a significant result:
FALSE POSITIVE!

How often will you get
a false positive?

Assuming you check results after every impression and
stop once you reach significance….
26.1%
So you just went from 95% confidence to 74%
This is a worst-case scenario. BUT, some test platforms do this automatically!

OK…well, then when should I stop an
A/B test?

SAMPLE SIZE
Dictates how long to run a test

SAMPLE SIZE
• Used religiously in the pharmaceutical Industry,
economic studies, etc…

https://www.optimizely.com/resources/sample-size-calculator

Agenda
1. How we put this into practice on a website test
2. How we applied these learnings to email testing:
• Open rates
• Click to Open Rates
• Conversion Rates

A/B Testing on your website
Here’s your new test process:
1. Determine your baseline conversion rate (or click rate, or
download rate, etc..)
2. Decide how long you are willing to wait for a result. Convert your
unique traffic metric to a sample size.
3. Adjust MDE (Minimum Detectable Effect) until your Sample Size is
just under the target you determined in #2 above.
4. Re-adjust MDE until you are content.
5. Start the test, and don’t stop until you hit the sample size.

Case Study: Item Urgency
TEST (VERSION A):
INVENTORY NOTIFICATION
CONTROL (VERSION B):
NO INVENTORY NOTIFICATION

STEP 1 – We determined our baseline conversion rate

STEP 2 – Calculate Target Sample Size
We initially decided we wanted a result in 2 weeks.
So we took the last 2 weeks of unique product page views:

STEP 2 – Calculate Target Sample Size
We then divided that number by two (since we’ll have two test
segments)
Divided by two again to account for desktop traffic only
Then multiplied by 5% (since the message only displays on 5% of
product pages)
Sample Size -> 12,351

This gave us 30% MDE (Conversion Lift). This is unrealistic

107,105 unique visits ~ 17 weeks

You’re probably not
running your tests long
enough

WAIT A MINUTE.
MY A/B TEST PLATFORM SAYS NOTHING
ABOUT SAMPLE SIZE…

EVERYONE WANTS INSTANT GRATIFICATION

YOUR A/B TEST PLATFORM IS HAPPY TO SELL IT

Quietly assuming you have calculated
sample size on your own

Item Urgency - Test Results
We are over 4 weeks in….
*Conv. rate is higher than expected because test platform runs on 7 day conversion window.

Item Urgency - Test Results
Lift is over 10%
Note the spike in the beginning and the increased stabilization with time

Test Results
The effect is slowly approaching the MDE

Test Results
Significance is now over 95%, but it’s been up and down.
Many marketers would stop the test on 9/5 and declare a 57% Lift.

After learning about Sample Size,
we reconsidered our email testing strategy
• Open Rate (Subject line testing)
• Click-to-Open (CTO) Rate
• Conversion Rate

OPEN RATE
We used sample size to gut check the
size of our subject line test segments

OPEN RATE
Remember, for the sample size calculator,
you need the baseline conversion rate and
then the sample size, and that will give you
MDE.

OPEN RATE
First, we needed the baseline conversion open rate

OPEN RATE
Our open rates typically end up ~ 17% , but when
we make the call on our winning subject line, open
rates are usually around 7%.

OPEN RATE
Next we need the sample size

OPEN RATE
We always test 4 different subject lines.
We had been sending each subject line to 10,000
customers.
So, sample size ~ 10,000

OPEN RATE
Plugging these numbers in, this would only detect 13% open rate lift or higher

OPEN RATE
13% lift on 17% open rate is 19.2%.
We rarely see subject lines this high
We needed a lower MDE to make sure we could detect
more winners…

OPEN RATE
We ended up doubling our subject line segment to
80,000, giving us an MDE ~ 9.2%

CTO
First we needed the baseline

CTO
We averaged the last 10 weeks -> 11% CTO

CTO
Sample size = ½ of the avg opens count

CTO
We averaged the last 10 weeks -> Avg
opens = 107,000 / 2 = 53,500

CTO
4.4% CTO lift is a very reasonable goal for a test.
This showed us that we could trust most of the
results of our past CTO tests.

GRID vs. FREE FORM
15.7% CTO Lift

PRODUCT NAMES vs. NO PRODUCT NAMES
22.6% CTO Lift

Conversion Rate
We had been making many email decisions
after reaching significance
on a conversion rate lift

Conversion Rate
Time for a reality check.

Conversion Rate
Baseline Conversion Rate ~ 1.5%

Conversion Rate
Sample Size = ½ Average # Clicks -> 6,000

Conversion Rate
38% is ASTRONOMICAL

Conversion Rate
To get meaningful results for conversion rate,
consider running an email test many times, so that
you can eventually reach the necessary sample
size.

Takeaways
This is the MDE curve again. Remember what this looks like.
The longer you run a test, the lower MDE will be.
The more traffic volume you have, the faster MDE will drop

Takeaways
For Web Testing
• If you stop your A/B tests once you reach statistical significance, you are increasing your chances of finding
false positives
• Calculating sample size will give you a clear stop date and an MDE
• MDE and sample size are inversely related – The lower the MDE, the larger the sample size
• Most likely, your A/B tests need to run much longer than you realize
For Email Testing
• Use sample size to determine the size of your subject line test segments
• Your CTO tests are probably reaching the necessary sample size
• Your Conversion tests are probably not hitting sample size

Sources
Kyle Rush – Mozcon 2014 Presentation
https://seomoz.box.com/shared/static/2fw6yevkkmmdum
z431j4.pdf
Evan Miller – How not to run an AB test
http://www.evanmiller.org/how-not-to-run-an-ab-test.
html

Zack Notes
Digital Marketing
Manager
zack@uncommongoods.com
@zacknotes
slideshare.net/zacknotes1/presentations

PRODUCT NAMES vs. NO PRODUCT NAMES

What do you do if a test reaches sample size
and your lift < MDE?

You can either extend the test and accept a
lower MDE or Move On.

SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Similaire à SAMPLE SIZE – The indispensable A/B test calculation that you’re not making (20)

Dernier

Dernier (20)

SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Notes de l'éditeur