The document discusses the role of a data analyst within an A/B testing culture. It outlines that the analyst's tasks involve: 1) Ensuring high quality data is collected, 2) Prioritizing tests that have the highest potential for effectiveness, and 3) Performing business case calculations to assess financial impacts. The analyst is responsible for determining which experiments to run based on statistical power, prioritizing experiments with the largest predicted minimum detectable effects, and calculating returns on investment from successful experiments to optimize company growth within budget constraints.
11. TON@ONLINEDIALOGUE.COM
“Our success at Amazon
is a function of
how many experiments
we do per year, per month, per
week, per day…”
Jeff Bezos, CEO Amazon
14. TON@ONLINEDIALOGUE.COM
What should be done with the A/B-test program?!
A. Increase budgets!
• More a/b-tests (quantity)!
!
B. Increase knowledge!
• Better a/b-tests (quality)!
!
C. Decrease budgets!
• Less a/b-tests (quantity)!
15. TON@ONLINEDIALOGUE.COM
This should always be the answer!
A. Increase budgets!
• More a/b-tests (quantity)!
But in reality it’s different...!
ü You can calculate the answer!
ü You have a big influence on the outcome!
20. TON@ONLINEDIALOGUE.COM
Make sure your testing solution has all users!
Users on template: 42186!
Users in the tool: 37652!
Users with code executed: 34312 !
100%!
89%!
81%!
26. TON@ONLINEDIALOGUE.COM
Be able to create behavioral segments!
Typical ecommerce flow example:
ü All users on your website with enough time to take action
ü All users on your website with at least some interaction
ü All users on your website with heavy interaction
ü All users on your website with clear intent to buy
ü All users on your website that are willing to buy
ü All users on your website that succeed in buying
ü All users on your website that return with intent to buy more
Funnel
+
Average
Lme
37. TON@ONLINEDIALOGUE.COM
Power
Do not reject H0 Reject H0
H0 is true
Correct decision
J
Type I
False Positive (α)
H0 is false
Type II
False Negative (β)
Correct decision
J
Measured
Reality
38. TON@ONLINEDIALOGUE.COM
Power
New version is
NOT better
New version is
better
New version is
NOT better
Correct decision
J
Type I
False Positive (α)
New version is
better
Type II
False Negative (β)
Correct decision
J
Measured
Reality
39. TON@ONLINEDIALOGUE.COM
Power & Significance rule of thumb
Power
When you start: try to test on pages with a high Power
(>80%) à otherwise you don’t detect effects when there is
an effect to be detected (False negatives).
Significance
When you start: try to test against a high enough
significance level (90%) à otherwise you’ll declare winners,
when in reality there isn’t an effect (False positives).
49. TON@ONLINEDIALOGUE.COM
Test Power Determination
DETERMINE UNIQUE WEEKLY VISITORS PER PAGE TYPE!
à Look up the number of weekly visitors with this behavior (select multiple
weeks and device by the number of weeks to account for fluctuation)
52. TON@ONLINEDIALOGUE.COM
Test Power Determination
DETERMINE UNIQUE VISITORS WITH A CONVERSION PER PAGE TYPE!
à Look up the number of weekly visitors with a conversion (select multiple
weeks and device by the number of weeks to account for fluctuation)
à Make sure you don’t have sampled data. Otherwise select a shorter period
67. TON@ONLINEDIALOGUE.COM
What does your calculation look like?!
If significant result:
!
Extra new customers per week!
x!
52 weeks effective!
x!
Average lifetime value!
68. TON@ONLINEDIALOGUE.COM
What does your calculation look like?!
If significant result:
!
Extra transactions per week!
X!
26 weeks effective!
x!
Average order value!
69. TON@ONLINEDIALOGUE.COM
So this experiment will bring us:!
€232,840!
(revenue in 6 months after implementation)
Ø And then just add up all the winners from the past year?
Ø Which makes €5,273,132 for the whole program?
Ø And devide that through the yearly costs of €623,400
Ø So your ROI is: €8.46 revenue per €1 investment?
72. TON@ONLINEDIALOGUE.COM
So that one experiment will bring us:!
€232,840 * (100%-Type-M error %)?!
!
(Yes, if it indeed is a true positive)!
!
€232,840 * (100% - 12%) = €204,899
74. TON@ONLINEDIALOGUE.COM
How NOT to shorten the length of your A/B-test!
hSps://www.einarsen.no/is-your-ab-tesLng-effort-just-chasing-staLsLcal-ghosts/
78. TON@ONLINEDIALOGUE.COM
How to shorten the length of your A/B-test!
hSps://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d
“CUPED tries to remove variance in a metric
that can be accounted for by pre-experiment information”
79. TON@ONLINEDIALOGUE.COM
You could even find more wins!
hSps://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d
83. TON@ONLINEDIALOGUE.COM
Should I stop the experiment?
ü Is something broken? à YES!
ü Is there a SRM error? à YES!
ü Are we losing too much money? à YES!
(and maybe a low chance of becoming significant if you can start a next experiment now)
84. TON@ONLINEDIALOGUE.COM
Back to the calculation!
€232,840 * (100%-Type-M error %)?!
!
(Yes, if it indeed is a true positive)!
!
€232,840 * (100% - 12%) = €204,899
86. TON@ONLINEDIALOGUE.COM
What is your False Discovery Rate?!
Significance border: 90%!
100 experiments!
20 significant outcomes!
!
50%!* (it’s a little lower, this is the poor man’s calculation)!
(with every real win the number of experiments without wins becomes lower, which leads to less false positives)!
87. TON@ONLINEDIALOGUE.COM
So not really 50%!
FDR* = (Measured Wins - ((Measured Wins - !
((100% - Confidence Level) * Experiments))!
/ Confidence Level)) / Measured Wins!
!
=!
!
(20 – ((20 – ((100% - 90%) * 100)) / 90%)) / 20!
!
=!
!
44%!* (only if your power on all experiments was 100%)!
(Your Power will be lower, which means you had more real wins, but not measured (false negatives).!
This leads to less experiments without an effect, so the number of false positives will be even lower)!
88. TON@ONLINEDIALOGUE.COM
Rule of thumb: once you have 10 winners or more!
You can calculate your
True Discovery Rate
Power(Winners+Significance-1)
Winners(Power+Significance-1)
80%*(20%+90%-1) = 0.08
20%*(80%+90%-1) = 0.14
=
57,14%
91. TON@ONLINEDIALOGUE.COM
So all your experiments will bring you:!
Sum of!
(every winner x (100% - Type-M error % per winner))!
!
X!!
True Discovery Rate!
x!
Implementation % (within x months…)!
(assuming every new win is tested on the new default where all earlier wins are implemented)!
94. TON@ONLINEDIALOGUE.COM
Are you above or below your ROI limit?!
1. Above: increase budgets!
2. Below: increase knowledge!
3. Still below: decrease budgets!
95. TON@ONLINEDIALOGUE.COM
Are you above or below your ROI limit?!
① Above: Increase budgets!
• More a/b-tests (quantity)!
• Lower win%, more winners
② Below: Increase knowledge!
• Better a/b-tests (quality)!
• Higher win%, more winners
!
③ Still below: Decrease budgets!
• Less a/b-tests (quantity)!
• Higher win%, less winners
96. TON@ONLINEDIALOGUE.COM
You can help getting to this answer!
A. Increase budgets!
• More a/b-tests (quantity)!
ü You can calculate the answer!
ü You have a big influence on the outcome!