Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!

Ton Wesseling
Jan 27 - 31, 2020
How an analyst can add value!
Digital Experiments

TON@ONLINEDIALOGUE.COM
User Growth by Wilson Joseph for the Noun Project
Business Science!

Internet by Cindy Hu for the Noun Project
Internet / websites!

Data Analyst by Five by Five for the Noun Project
Data Analyst!

A/B-test by By Evangeline White for the Noun Project
A/B-testing!

Everyone seems to like it!

2018 research by OpLmizely
A/B-testing Culture!

Hierarchy of evidence pyramid!

A/B-tesLng mastery course
This talk mostly makes sense
if you have 10.000 transactions
or more per month – enough to
get experimentation in the
DNA of your organization.!

“Our success at Amazon
is a function of
how many experiments
we do per year, per month, per
week, per day…”

Jeﬀ Bezos, CEO Amazon

Data Analyst by Five by Five for the Noun Project
Data Analyst!
In the new world it’s the companies that have lots of data
and know how to properly use it that outperform the competition

What should be done with the A/B-test program?!
A.  Increase budgets!
•  More a/b-tests (quantity)!
!
B.  Increase knowledge!
•  Better a/b-tests (quality)!
!
C.  Decrease budgets!
•  Less a/b-tests (quantity)!

This should always be the answer!
But in reality it’s different...!
ü  You can calculate the answer!
ü  You have a big inﬂuence on the outcome!

DEF!
The task of an analyst within an A/B-testing Culture!
1.  Data!
2.  Effectiveness!
3.  Finance!

Data!
Let there be high quality data!

Make sure all funnels are measured…!

Make sure your testing solution has all users!
Users on template: 42186!
Users in the tool: 37652!
Users with code executed: 34312 !
100%!
89%!
81%!

What if my experiments had 20% more users?!

Recognizing returning users!

Recognizing returning users!
Buddhini S. on Jargon Wall

Be able to segment on page interactions!

Be able to segment on who can be inﬂuenced!

Be able to create behavioral segments!
Typical ecommerce ﬂow example:
ü  All users on your website with enough time to take action
ü  All users on your website with at least some interaction
ü  All users on your website with heavy interaction
ü  All users on your website with clear intent to buy
ü  All users on your website that are willing to buy
ü  All users on your website that succeed in buying
ü  All users on your website that return with intent to buy more
Funnel
+
Average
Lme

Scientiﬁc method

Effectiveness!
Make sure you work on stuff!
with the highest potential outcome!

Statistical Power!
The likelihood that an experiment will
detect an effect, when there is an effect
there to be detected!

Power & Signiﬁcance
New version is
NOT better
New version is
better
New version is
NOT better
New version is
better
Measured
Reality

Power & Signiﬁcance
Do not reject H0 Reject H0
H0 is true
H0 is false
Measured
Reality

Signiﬁcance
H0 is true
H0 is false
Correct decision
J
Measured
Reality

Signiﬁcance
H0 is true
Type I
False Positive (α)
H0 is false
Correct decision
J
Measured
Reality

Power
H0 is true
Correct decision
J
Type I
False Positive (α)
H0 is false
Correct decision
J
Measured
Reality

Power
H0 is true
Correct decision
J
Type I
False Positive (α)
H0 is false
Type II 
False Negative (β)
Correct decision
J
Measured
Reality

Power
New version is
NOT better
New version is
better
New version is
NOT better
Correct decision
J
Type I
False Positive (α)
New version is
better
Type II 
False Negative (β)
Correct decision
J
Measured
Reality

Power & Significance rule of thumb
Power
When you start: try to test on pages with a high Power
(>80%) à otherwise you don’t detect effects when there is
an effect to be detected (False negatives).
Significance
When you start: try to test against a high enough
significance level (90%) à otherwise you’ll declare winners,
when in reality there isn’t an effect (False positives).

This looks good!

This is fascinating!

This makes me sad!

https://abtestguide.com/abtestsize/!

https://ondi.me/bandwidth!

Prioritize based on MDE to start!

Test Power Determination
DETERMINE UNIQUE WEEKLY VISITORS PER PAGE TYPE!
ü  We run and evaluate A/B tests on the unique visitor metric: we want to
inﬂuence unique users

à Build a segment for each page type / segment / test platform combination

à Look up the number of weekly visitors with this behavior (select multiple
weeks and device by the number of weeks to account for ﬂuctuation)

DETERMINE UNIQUE VISITORS WITH A CONVERSION PER PAGE TYPE!
ü  Visitors must have seen the test page before they converted
€
Converted

à Build a 2nd sequential segment with page seen à converted

à Look up the number of weekly visitors with a conversion (select multiple
weeks and device by the number of weeks to account for ﬂuctuation)
à Make sure you don’t have sampled data. Otherwise select a shorter period

Prioritize based on measured results!!

Prioritize based on measured results!!
With real data from your program!
your prioritization will change!!

Type-M errors…

Prioritize based on measured results?!
(100% - M-Type Error) of course!
Low Power gives a higher Type-M error

Finance!
Business case calculations!

What does your calculation look like?!
If signiﬁcant result:
!
Extra new customers per week!
x!
52 weeks effective!
x!
Average lifetime value!

What does your calculation look like?!
If signiﬁcant result:
!
Extra transactions per week!
X!
26 weeks effective!
x!
Average order value!

So this experiment will bring us:!
€232,840!
(revenue in 6 months after implementation)
Ø  And then just add up all the winners from the past year?
Ø  Which makes €5,273,132 for the whole program?

Ø  And devide that through the yearly costs of €623,400
Ø  So your ROI is: €8.46 revenue per €1 investment?

Implementing winners…!

So that one experiment will bring us:!
€232,840 * (100%-Type-M error %)?!
!
(Yes, if it indeed is a true positive)!
!
€232,840 * (100% - 12%) = €204,899

Let’s see if the result are already signiﬁcant!
Focusonpc via Pixabay

How NOT to shorten the length of your A/B-test!
hSps://www.einarsen.no/is-your-ab-tesLng-eﬀort-just-chasing-staLsLcal-ghosts/

How NOT to shorten the length of your A/B-test!
hSps://www.evanmiller.org/how-not-to-run-an-ab-test.html

How to shorten the length of your A/B-test!
hSps://codeascraV.com/2018/10/03/how-etsy-handles-peeking-in-a-b-tesLng/

hSps://medium.com/convoy-tech/the-power-of-bayesian-a-b-tesLng-f859d2219d5

hSps://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186ﬀf1d
“CUPED tries to remove variance in a metric
that can be accounted for by pre-experiment information”

You could even ﬁnd more wins!
hSps://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186ﬀf1d

SRM checks anybody?
Also check Lukas vermeer at #CH2019: https://conversionhotel.com/session/keynote-2019-run-better-experiments-srm-checks/

Running experiment dashboard

Should I stop the experiment?
ü  Is something broken? à YES!
ü  Is there a SRM error? à YES!
ü  Are we losing too much money? à YES!
(and maybe a low chance of becoming signiﬁcant if you can start a next experiment now)

Back to the calculation!
€232,840 * (100%-Type-M error %)?!
!
(Yes, if it indeed is a true positive)!
!
€232,840 * (100% - 12%) = €204,899

What is your False Discovery Rate?!
Signiﬁcance border: 90%!
100 experiments!
20 signiﬁcant outcomes!
!
50%!* (it’s a little lower, this is the poor man’s calculation)!
(with every real win the number of experiments without wins becomes lower, which leads to less false positives)!

So not really 50%!
FDR* = (Measured Wins - ((Measured Wins - !
((100% - Conﬁdence Level) * Experiments))!
/ Conﬁdence Level)) / Measured Wins!
!
=!
!
(20 – ((20 – ((100% - 90%) * 100)) / 90%)) / 20!
!
=!
!
44%!* (only if your power on all experiments was 100%)!
(Your Power will be lower, which means you had more real wins, but not measured (false negatives).!
This leads to less experiments without an effect, so the number of false positives will be even lower)!

Rule of thumb: once you have 10 winners or more!
You can calculate your
True Discovery Rate
Power(Winners+Signiﬁcance-1)
Winners(Power+Signiﬁcance-1)
80%*(20%+90%-1) = 0.08
20%*(80%+90%-1) = 0.14
=
57,14%

https://abtestguide.com/fdr/!
FDR / TDR calculator!

FDR / TDR calculator!

So all your experiments will bring you:!
Sum of!
(every winner x (100% - Type-M error % per winner))!
!
X!!
True Discovery Rate!
x!
Implementation % (within x months…)!
(assuming every new win is tested on the new default where all earlier wins are implemented)!

So all your experiments will bring you:!
€5,273,132 x (100%-12% average Type-M)!
!
X!!
57,14%!
=!
€2,651,500!

Maximize your growth within your ROI limit:!
Value of A/B-testing for Optimization!
!
Costs of A/B-testing for Optimization!
= ROI!

Are you above or below your ROI limit?!
1.  Above: increase budgets!
2.  Below: increase knowledge!
3.  Still below: decrease budgets!

Are you above or below your ROI limit?!
①  Above: Increase budgets!
•  Lower win%, more winners
②  Below: Increase knowledge!
•  Better a/b-tests (quality)!
•  Higher win%, more winners
!
③  Still below: Decrease budgets!
•  Less a/b-tests (quantity)!
•  Higher win%, less winners

You can help getting to this answer!
ü  You can calculate the answer!
ü  You have a big inﬂuence on the outcome!

Data Analyst - The Noun Project icon from the Noun Project
An A/B-testing for growth analyst:!
1.  Makes sure there is high
quality Data available!
2.  Steers the data chance
on Effect!
3.  Reports on the real
Financial impact!

Ton Wesseling
https://ondi.me/tonw
Let’s connect on LinkedIn

Latest article on A/B-testing:

Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!

Similaire à Keynote Ton Wesseling at Superweek 2020: How an analyst can add value! (20)

Plus de Ton Wesseling

Plus de Ton Wesseling (11)

Dernier

Dernier (20)

Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!