SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
Equal or unequal cell sizes in A/B testing?
Tom Haxton
Senior Data Scientist, Chegg
August 30, 2016
At Chegg we often run A/B tests to measure differences in conversion rates when
we change a webpage design. Most often, we split traffic evenly into control and
experimental cells. However, for a variety of reasons we sometimes allocate only
a small fraction of our incoming traffic (e.g. 5%) to the experimental cell. In
these cases, we need to decide which control group to compare to the small
experimental group. In a truly randomized experiment, our results should not
depend on our choice of control group, because sample means are unbiased
estimators of population means. Thus, to reach a desired level of statistical
certainty fastest, we would want to use the entire remaining 95% of traffic as
the control group. However, at Chegg we have found that our test results (e.g.
differences in conversion rates) can vary if we use an imbalanced control group
(95%) vs an equal control group (5%, with a 90% “holdout cell” removed from
analysis).
I looked on the web for any discussion on why A/B test results could depend on
the size of the control cell. I found conflicting advice on whether to use equal
or unequal control cell size and no explanation of why, except for those pointing
out that confidence intervals calculated assuming normal distributions will be
less accurate when cell sizes are smaller. So I constructed a minimal theoretical
model for A/B tests measuring conversion and solved the model to find out if
measured conversion rates depend on cell sizes.
For those interested in the details, read on to the next section. If you just
want the punchline, it turns out that the dependence of test results on cell size
comes from a combination of two effects: (1) unconverted (anonymous) visitors
coming back to a website with an identity that cannot be linked to their first
visit (e.g. on a new device or with cookies turned off) and (2) return rates
and/or second-visit conversion rates varying between control and experiment
experiences. The first effect reminds us that A/B tests on anonymous web
traffic are not truly randomized experiments, because the anonymous visitors
we treat as independent may in fact be the same people.
1
So what do we do? In the model I found that test results will usually be most
accurate when we use equal-size experimental and control cells, so I recommend
using equal-size cells with a holdout cell whenever a 50/50 split is not appro-
priate. However, I found that even in this case results will not in general agree
with what we would measure if we could track visitors perfectly. This is another
reminder that A/B test results on anonymous web traffic must be taken with a
grain of salt.
In the following sections, I will discuss (1) the model and math leading to the
results, (2) trends, and (3) the case of equal cell sizes.
1 Model and math
For simplicity, assume we have only one experimental cell. We want to know
whether the difference in conversion rates that we measure depends on the
size of the control cell and, if so, why. This approach should generalize to
multiple experimental cells and to metrics other than conversion that are led by
conversion.
We have an experimental cell of size f and a control cell of size 1 − f. Assume
that visitors convert on their first visit to the control (experimental) cell with
a probability pc
1 (pe
1). Assume that they do not convert but return with a
probability rc
1 (rc
1). Assume that some fraction d of those return with an identity
that cannot be linked with their initial identity, and assume that there is no
interaction between the likelihood to come back with a new identity and the
other probabilities.
The probability to convert on a second visit can depend on the experience in
both the first and second visits, so there may be four distinct probabilities
to convert on the second visit, pcc
2 , pce
2 , pec
2 , and pee
2 , where the first (second)
superscript index refers to the first (second) visit.
For simplicity, let’s assume that no one returns for a third visit, but these results
could be generalized to multiple return visits.
The number of conversions in the control cell (relative to the total number of
visitors) is
(1 − f)pc
1 + (1 − f)rc
1(1 − d)pcc
2 + (1 − f)rc
1d(1 − f)pcc
2 + fre
1d(1 − f)pec
2 . (1)
The first term in Eq. 1 represents visitors who arrive in the control cell and
convert on the first visit. The second term represents visitors who arrive in
the control cell, do not convert but return, return with a same identity, and
convert on the second visit. The third term represents visitors who arrive in the
control cell, do not convert but return, return with a different identity, arrive in
2
the control cell on their second visit, and convert. The fourth term represents
visitors who arrive in the experimental cell on their first visit, do not convert
but return, return with a different identity, arrive in the control cell in their
second visit, and convert.
Similarly, the number of conversions in the experimental cell (relative to the
total number of visitors) is
fpe
1 + fre
1(1 − d)pee
2 + fre
1dfpee
2 + (1 − f)rc
1dfpce
2 . (2)
The number of unique identities counted in the control cell (relative to the total
number of visitors) is
(1 − f) + (1 − f)rc
1d(1 − f) + fre
1d(1 − f). (3)
The first term in Eq. 3 represents visitors who arrive first in the control cell.
The second term represents visitors who arrive in the control cell, do not convert
but return, return with a different identity, and arrive in the control cell the
second time. The third term represents visitors who arrive in the experimental
cell, do not convert but return, return with a different identity, and arrive in
the control cell the second time.
Similarly, the number of unique identities counted in the experimental cell (rel-
ative to the total number of visitors) is
f + fre
1df + (1 − f)rc
1df. (4)
The apparent conversion rates pc
and pe
are obtained by dividing Eq. 1 by Eq. 3
and Eq. 2 by Eq. 4. We get
pc
=
pc
1 + rc
1(1 − d)pcc
2 + rc
1d(1 − f)pcc
2 + fre
1dpec
2
1 + rc
1d(1 − f) + fre
1d
(5)
and
pe
=
pe
1 + re
1(1 − d)pee
2 + re
1dfpee
2 + (1 − f)rc
1dpce
2
1 + re
1df + (1 − f)rc
1d
(6)
From Eqs. 5 and 6 we see that if we can always identify visitors perfectly (d = 0)
there should be no dependence of apparent conversion rates on allocation size.
In that case
pc
= pc
1 + rc
1pcc
2 (7)
and
pe
= pe
1 + re
1pee
2 . (8)
However, if we lose some identities (d > 0), then the apparent conversion rates
will depend on allocation size unless both the return rates are the same, re
1 = rc
1,
3
and the second-visit conversion rates do not depend on the experience in the first
visit, pcc
2 = pec
2 and pce
2 = pee
2 . If both of these types of rates are different between
cells, the dependence on allocation is complicated (Eq. 5 and 6). Usually, we
would expect that the return rate would be more different between cells than
the dependence of second visit conversion on first visit experience, so to get the
dominant behavior we assume that pcc
2 = pec
2 ≡ pc
2 and pce
2 = pee
2 ≡ pe
2. Then
pc
=
pc
1 + rc
1pc
2 + (re
1 − rc
1) dfpc
2
1 + rc
1d + f (re
1 − rc
1) d
(9)
and
pe
=
pe
1 + re
1pe
2 + (rc
1 − re
1) d(1 − f)pe
2
1 + re
1d + (1 − f) (rc
1 − re
1) d
. (10)
Expanding in d,
pc
= pc
1 + rc
1pc
2 + [(re
1 − rc
1)f(pc
2 − pc
1 − rc
1pc
2) − (pc
1 + rc
1pc
2)rc
1] d + O(d2
) (11)
and
pe
= pe
1 + re
1pe
2 + [(rc
1 − re
1)(1 − f)(pe
2 − pe
1 − re
1pe
2) − (pe
1 + re
1pe
2)re
1] d + O(d2
)
(12)
The apparent conversion rates change with allocation size according to
dpc
df
= d (re
1 − rc
1) (pc
2 − pc
1 − rc
1pc
2) + O(d2
) (13)
dpe
df
= d (re
1 − rc
1) (pe
2 − pe
1 − re
1pe
2) + O(d2
), (14)
so that the change in relative conversion rates is
d(pe
− pc
)
df
= d(re
1 − rc
1) ((pe
2 − pc
2) − (pe
1 − pc
1) − (re
1pe
2 − rc
1pc
2)) + O d2
. (15)
Dropping higher order terms in the return rates (assuming these rate are sub-
stantially less than 1), this simplifies to
d(pe
− pc
)
df
= d(re
1 − rc
1) ((pe
2 − pc
2) − (pe
1 − pc
1)) + O d(r1)2
p2 + O d2
. (16)
2 Trends
Depending on the values on the right side of Eq. 16, this effect could go either
way. In general, we expect that second-visit conversion rates are lower than
first-visit conversion rates, so differences between second-visit conversion rates
will also usually be smaller than differences between first-visit conversion rates,
4
|pe
2 − pc
2| < |pe
1 − pc
1|. This means that to estimate the direction of the effect we
can consider the simpler approximation
d(pe
− pc
)
df
∼ −d(re
1 − rc
1)(pe
1 − pc
1). (17)
Additionally, when return rates and second-visit conversion rates are small, we
expect the sign of pe
− pc
to be the same as the sign of pe
1 − pc
1, so the direction
of the effect is given by
Sign
d(pe
− pc
)
df
= − Sign (re
1 − rc
1) Sign (pe
− pc
) . (18)
This means that when the return rate is larger for unconverted visitors from
the experimental cell, the difference in conversion rates (whichever way it goes)
is increasingly overestimated as the control cell gets bigger (f decreases). Con-
versely, when the return rate is smaller for unconverted visitors from the exper-
imental cell, the difference in conversion rates is increasingly underestimated as
the control cell gets bigger. The effect is not likely to switch the sign of the
difference in conversion rates (which would lead to qualitatively wrong results)
because f, d, and |re
1 − rc
1| in Eq. 17 all must be less than 1.
3 Should we trust same-size cells?
Given that our results depend on cell size, our intuition has been to trust the
results of A/B tests with equal-size cells, since this seems to compare the vari-
ations on more equal footing. But should we fully trust these results? That is,
are the results with equal-size cells the same as what we would find in the ideal
experimental design where we perfectly track all visitors’ identities?
If the cells are the same size (f = 1 − f = 1/2) the difference in apparent
conversion rates turns out to be
pe
− pc
=
pe
1 − pc
1 + re
1(1 − d/2)pe
2 − rc
1(1 − d/2)pc
2 + rc
1dpe
2/2 − re
1dpc
2/2
1 + (re
1 + rc
1)d/2
. (19)
Comparing this to the conversion rate we would measure if we lost no identities
(d = 0),
(pe
− pc
)0 = pe
1 − pc
1 + re
1pe
2 − rc
1pc
2, (20)
we find that the lost identities change the apparent difference in conversion rates
by
pe
− pc
− (pe
− pc
)0 =
rc
1pe
2 − re
1pc
2 + (rc
1 + re
1)(rc
1pc
2 − re
1pe
2)
2/d + (re
1 + rc
1)
. (21)
The right side of Eq. 21 does not equal 0 in general, so even if we use equal-size
cells, the difference in conversion rates that we measure is not the same as what
5
we would measure if we could perfectly keep track of everyone’s identity. To
lowest order in return rates Eq. 21 can be written
pe
− pc
− (pe
− pc
)0 =
d
2
(rc
1pe
2 − re
1pc
2)) + O(r2
1p2
). (22)
This effect is small whenever second-visit conversion rates are small (or whenever
their particular combination with return rates in Eq. 22 is small). In general we
expect that first-visit conversion rates are larger than second-visit conversion
rates, so the discrepancy from having unequal cells will usually be larger than
the discrepancy purely from losing visitors’ identities. As for the discrepancy
from unequal cells, the direction of the latter effect can go either way.
4 Bottom line
Whenever we cannot perfectly track visitors’ identities, we must take A/B tests
with a grain of salt: measured conversion rates will be different from what we
would measure if we could perfectly track identities. Although part of this
effect—and usually the larger part—can be avoided by using same-size cells,
even A/B tests with same-size cells will not in general give accurate results
unless we can perfectly track visitors’ identities.
6

Contenu connexe

Tendances

Ppt L. Sacrifices
Ppt L. SacrificesPpt L. Sacrifices
Ppt L. Sacrifices
Joy Joseph
 

Tendances (20)

A Born Again Christian!
A Born Again Christian!A Born Again Christian!
A Born Again Christian!
 
Hearing the Voice of God (Revised)
Hearing the Voice of God (Revised)Hearing the Voice of God (Revised)
Hearing the Voice of God (Revised)
 
New Testament Survey - no.22: Paul - Letter to Philemon
New Testament Survey - no.22: Paul - Letter to PhilemonNew Testament Survey - no.22: Paul - Letter to Philemon
New Testament Survey - no.22: Paul - Letter to Philemon
 
Message 8.9.13 on giving
Message 8.9.13 on givingMessage 8.9.13 on giving
Message 8.9.13 on giving
 
Sermon 2016 # 1 church slides the gospel and salvation the story of zacch...
Sermon 2016 # 1 church slides   the gospel and salvation   the story of zacch...Sermon 2016 # 1 church slides   the gospel and salvation   the story of zacch...
Sermon 2016 # 1 church slides the gospel and salvation the story of zacch...
 
Galatians 5
Galatians 5Galatians 5
Galatians 5
 
Fasting - The focus is intimacy with God
Fasting - The focus is intimacy with GodFasting - The focus is intimacy with God
Fasting - The focus is intimacy with God
 
The Elijah Challenge End-Time Model of Evangelism
The Elijah Challenge End-Time Model of EvangelismThe Elijah Challenge End-Time Model of Evangelism
The Elijah Challenge End-Time Model of Evangelism
 
Old Testament Sacrifices and Their Significance to Christianity
Old Testament Sacrifices and Their Significance to ChristianityOld Testament Sacrifices and Their Significance to Christianity
Old Testament Sacrifices and Their Significance to Christianity
 
1 samuel 4a You can't put God in a box.
1 samuel 4a You can't put God in a box.1 samuel 4a You can't put God in a box.
1 samuel 4a You can't put God in a box.
 
Session 2 What Is My Identity In Christ
Session 2   What Is My Identity In ChristSession 2   What Is My Identity In Christ
Session 2 What Is My Identity In Christ
 
Ppt L. Sacrifices
Ppt L. SacrificesPpt L. Sacrifices
Ppt L. Sacrifices
 
Galatians 5:16-26: The Spirit and The Flesh
Galatians 5:16-26: The Spirit and The FleshGalatians 5:16-26: The Spirit and The Flesh
Galatians 5:16-26: The Spirit and The Flesh
 
The Fruitful Christian
The Fruitful ChristianThe Fruitful Christian
The Fruitful Christian
 
HOW GOD TESTS YOUR FAITH
HOW GOD TESTS YOUR FAITHHOW GOD TESTS YOUR FAITH
HOW GOD TESTS YOUR FAITH
 
Temas importantes de 1 Juan
Temas importantes de 1 JuanTemas importantes de 1 Juan
Temas importantes de 1 Juan
 
Ezekiel
EzekielEzekiel
Ezekiel
 
Presentation: James 1:12-18
Presentation: James 1:12-18Presentation: James 1:12-18
Presentation: James 1:12-18
 
The Biblical Principles Of Giving
The Biblical Principles Of GivingThe Biblical Principles Of Giving
The Biblical Principles Of Giving
 
Jesus Light of the world
Jesus Light of the worldJesus Light of the world
Jesus Light of the world
 

Similaire à Equal or unequal cell sizes in A/B testing?

HOSVD-visualization
HOSVD-visualizationHOSVD-visualization
HOSVD-visualization
Keyvan Sadri
 
Recombination and LinkageA Three point test cross in Drosophil.docx
Recombination and LinkageA Three point test cross in Drosophil.docxRecombination and LinkageA Three point test cross in Drosophil.docx
Recombination and LinkageA Three point test cross in Drosophil.docx
sodhi3
 
tw1979 Exercise 2 Report
tw1979 Exercise 2 Reporttw1979 Exercise 2 Report
tw1979 Exercise 2 Report
Thomas Wigg
 
Experimental design
Experimental designExperimental design
Experimental design
Sandip Patel
 
Analysis of single server queueing system with batch service
Analysis of single server queueing system with batch serviceAnalysis of single server queueing system with batch service
Analysis of single server queueing system with batch service
Alexander Decker
 
Analysis of single server queueing system with batch service
Analysis of single server queueing system with batch serviceAnalysis of single server queueing system with batch service
Analysis of single server queueing system with batch service
Alexander Decker
 

Similaire à Equal or unequal cell sizes in A/B testing? (20)

Ijetr021233
Ijetr021233Ijetr021233
Ijetr021233
 
HOSVD-visualization
HOSVD-visualizationHOSVD-visualization
HOSVD-visualization
 
Validaternai
ValidaternaiValidaternai
Validaternai
 
Where and why are the lucky primes positioned in the spectrum of the Polignac...
Where and why are the lucky primes positioned in the spectrum of the Polignac...Where and why are the lucky primes positioned in the spectrum of the Polignac...
Where and why are the lucky primes positioned in the spectrum of the Polignac...
 
Econometric Investigation into Cryptocurrency Price Bubbles in Bitcoin and Et...
Econometric Investigation into Cryptocurrency Price Bubbles in Bitcoin and Et...Econometric Investigation into Cryptocurrency Price Bubbles in Bitcoin and Et...
Econometric Investigation into Cryptocurrency Price Bubbles in Bitcoin and Et...
 
An Econometric Investigation into Cryptocurrency Price Bubbles in Bitcoin and...
An Econometric Investigation into Cryptocurrency Price Bubbles in Bitcoin and...An Econometric Investigation into Cryptocurrency Price Bubbles in Bitcoin and...
An Econometric Investigation into Cryptocurrency Price Bubbles in Bitcoin and...
 
Recombination and LinkageA Three point test cross in Drosophil.docx
Recombination and LinkageA Three point test cross in Drosophil.docxRecombination and LinkageA Three point test cross in Drosophil.docx
Recombination and LinkageA Three point test cross in Drosophil.docx
 
Sheet#2
Sheet#2Sheet#2
Sheet#2
 
tw1979 Exercise 2 Report
tw1979 Exercise 2 Reporttw1979 Exercise 2 Report
tw1979 Exercise 2 Report
 
Local Model Checking Algorithm Based on Mu-calculus with Partial Orders
Local Model Checking Algorithm Based on Mu-calculus with Partial OrdersLocal Model Checking Algorithm Based on Mu-calculus with Partial Orders
Local Model Checking Algorithm Based on Mu-calculus with Partial Orders
 
Common evaluation measures in NLP and IR
Common evaluation measures in NLP and IRCommon evaluation measures in NLP and IR
Common evaluation measures in NLP and IR
 
Multinomial Model Simulations
Multinomial Model SimulationsMultinomial Model Simulations
Multinomial Model Simulations
 
Experimental design
Experimental designExperimental design
Experimental design
 
Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...
Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...
Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...
 
Sh rn awhitepaper
Sh rn awhitepaperSh rn awhitepaper
Sh rn awhitepaper
 
Direct Design of Reversible Combinational and Sequential Circuits Using PSDRM...
Direct Design of Reversible Combinational and Sequential Circuits Using PSDRM...Direct Design of Reversible Combinational and Sequential Circuits Using PSDRM...
Direct Design of Reversible Combinational and Sequential Circuits Using PSDRM...
 
Analysis of single server queueing system with batch service
Analysis of single server queueing system with batch serviceAnalysis of single server queueing system with batch service
Analysis of single server queueing system with batch service
 
Analysis of single server queueing system with batch service
Analysis of single server queueing system with batch serviceAnalysis of single server queueing system with batch service
Analysis of single server queueing system with batch service
 
Split-plot Designs
Split-plot DesignsSplit-plot Designs
Split-plot Designs
 
Sat
SatSat
Sat
 

Dernier

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
HyderabadDolls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Dernier (20)

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Equal or unequal cell sizes in A/B testing?

  • 1. Equal or unequal cell sizes in A/B testing? Tom Haxton Senior Data Scientist, Chegg August 30, 2016 At Chegg we often run A/B tests to measure differences in conversion rates when we change a webpage design. Most often, we split traffic evenly into control and experimental cells. However, for a variety of reasons we sometimes allocate only a small fraction of our incoming traffic (e.g. 5%) to the experimental cell. In these cases, we need to decide which control group to compare to the small experimental group. In a truly randomized experiment, our results should not depend on our choice of control group, because sample means are unbiased estimators of population means. Thus, to reach a desired level of statistical certainty fastest, we would want to use the entire remaining 95% of traffic as the control group. However, at Chegg we have found that our test results (e.g. differences in conversion rates) can vary if we use an imbalanced control group (95%) vs an equal control group (5%, with a 90% “holdout cell” removed from analysis). I looked on the web for any discussion on why A/B test results could depend on the size of the control cell. I found conflicting advice on whether to use equal or unequal control cell size and no explanation of why, except for those pointing out that confidence intervals calculated assuming normal distributions will be less accurate when cell sizes are smaller. So I constructed a minimal theoretical model for A/B tests measuring conversion and solved the model to find out if measured conversion rates depend on cell sizes. For those interested in the details, read on to the next section. If you just want the punchline, it turns out that the dependence of test results on cell size comes from a combination of two effects: (1) unconverted (anonymous) visitors coming back to a website with an identity that cannot be linked to their first visit (e.g. on a new device or with cookies turned off) and (2) return rates and/or second-visit conversion rates varying between control and experiment experiences. The first effect reminds us that A/B tests on anonymous web traffic are not truly randomized experiments, because the anonymous visitors we treat as independent may in fact be the same people. 1
  • 2. So what do we do? In the model I found that test results will usually be most accurate when we use equal-size experimental and control cells, so I recommend using equal-size cells with a holdout cell whenever a 50/50 split is not appro- priate. However, I found that even in this case results will not in general agree with what we would measure if we could track visitors perfectly. This is another reminder that A/B test results on anonymous web traffic must be taken with a grain of salt. In the following sections, I will discuss (1) the model and math leading to the results, (2) trends, and (3) the case of equal cell sizes. 1 Model and math For simplicity, assume we have only one experimental cell. We want to know whether the difference in conversion rates that we measure depends on the size of the control cell and, if so, why. This approach should generalize to multiple experimental cells and to metrics other than conversion that are led by conversion. We have an experimental cell of size f and a control cell of size 1 − f. Assume that visitors convert on their first visit to the control (experimental) cell with a probability pc 1 (pe 1). Assume that they do not convert but return with a probability rc 1 (rc 1). Assume that some fraction d of those return with an identity that cannot be linked with their initial identity, and assume that there is no interaction between the likelihood to come back with a new identity and the other probabilities. The probability to convert on a second visit can depend on the experience in both the first and second visits, so there may be four distinct probabilities to convert on the second visit, pcc 2 , pce 2 , pec 2 , and pee 2 , where the first (second) superscript index refers to the first (second) visit. For simplicity, let’s assume that no one returns for a third visit, but these results could be generalized to multiple return visits. The number of conversions in the control cell (relative to the total number of visitors) is (1 − f)pc 1 + (1 − f)rc 1(1 − d)pcc 2 + (1 − f)rc 1d(1 − f)pcc 2 + fre 1d(1 − f)pec 2 . (1) The first term in Eq. 1 represents visitors who arrive in the control cell and convert on the first visit. The second term represents visitors who arrive in the control cell, do not convert but return, return with a same identity, and convert on the second visit. The third term represents visitors who arrive in the control cell, do not convert but return, return with a different identity, arrive in 2
  • 3. the control cell on their second visit, and convert. The fourth term represents visitors who arrive in the experimental cell on their first visit, do not convert but return, return with a different identity, arrive in the control cell in their second visit, and convert. Similarly, the number of conversions in the experimental cell (relative to the total number of visitors) is fpe 1 + fre 1(1 − d)pee 2 + fre 1dfpee 2 + (1 − f)rc 1dfpce 2 . (2) The number of unique identities counted in the control cell (relative to the total number of visitors) is (1 − f) + (1 − f)rc 1d(1 − f) + fre 1d(1 − f). (3) The first term in Eq. 3 represents visitors who arrive first in the control cell. The second term represents visitors who arrive in the control cell, do not convert but return, return with a different identity, and arrive in the control cell the second time. The third term represents visitors who arrive in the experimental cell, do not convert but return, return with a different identity, and arrive in the control cell the second time. Similarly, the number of unique identities counted in the experimental cell (rel- ative to the total number of visitors) is f + fre 1df + (1 − f)rc 1df. (4) The apparent conversion rates pc and pe are obtained by dividing Eq. 1 by Eq. 3 and Eq. 2 by Eq. 4. We get pc = pc 1 + rc 1(1 − d)pcc 2 + rc 1d(1 − f)pcc 2 + fre 1dpec 2 1 + rc 1d(1 − f) + fre 1d (5) and pe = pe 1 + re 1(1 − d)pee 2 + re 1dfpee 2 + (1 − f)rc 1dpce 2 1 + re 1df + (1 − f)rc 1d (6) From Eqs. 5 and 6 we see that if we can always identify visitors perfectly (d = 0) there should be no dependence of apparent conversion rates on allocation size. In that case pc = pc 1 + rc 1pcc 2 (7) and pe = pe 1 + re 1pee 2 . (8) However, if we lose some identities (d > 0), then the apparent conversion rates will depend on allocation size unless both the return rates are the same, re 1 = rc 1, 3
  • 4. and the second-visit conversion rates do not depend on the experience in the first visit, pcc 2 = pec 2 and pce 2 = pee 2 . If both of these types of rates are different between cells, the dependence on allocation is complicated (Eq. 5 and 6). Usually, we would expect that the return rate would be more different between cells than the dependence of second visit conversion on first visit experience, so to get the dominant behavior we assume that pcc 2 = pec 2 ≡ pc 2 and pce 2 = pee 2 ≡ pe 2. Then pc = pc 1 + rc 1pc 2 + (re 1 − rc 1) dfpc 2 1 + rc 1d + f (re 1 − rc 1) d (9) and pe = pe 1 + re 1pe 2 + (rc 1 − re 1) d(1 − f)pe 2 1 + re 1d + (1 − f) (rc 1 − re 1) d . (10) Expanding in d, pc = pc 1 + rc 1pc 2 + [(re 1 − rc 1)f(pc 2 − pc 1 − rc 1pc 2) − (pc 1 + rc 1pc 2)rc 1] d + O(d2 ) (11) and pe = pe 1 + re 1pe 2 + [(rc 1 − re 1)(1 − f)(pe 2 − pe 1 − re 1pe 2) − (pe 1 + re 1pe 2)re 1] d + O(d2 ) (12) The apparent conversion rates change with allocation size according to dpc df = d (re 1 − rc 1) (pc 2 − pc 1 − rc 1pc 2) + O(d2 ) (13) dpe df = d (re 1 − rc 1) (pe 2 − pe 1 − re 1pe 2) + O(d2 ), (14) so that the change in relative conversion rates is d(pe − pc ) df = d(re 1 − rc 1) ((pe 2 − pc 2) − (pe 1 − pc 1) − (re 1pe 2 − rc 1pc 2)) + O d2 . (15) Dropping higher order terms in the return rates (assuming these rate are sub- stantially less than 1), this simplifies to d(pe − pc ) df = d(re 1 − rc 1) ((pe 2 − pc 2) − (pe 1 − pc 1)) + O d(r1)2 p2 + O d2 . (16) 2 Trends Depending on the values on the right side of Eq. 16, this effect could go either way. In general, we expect that second-visit conversion rates are lower than first-visit conversion rates, so differences between second-visit conversion rates will also usually be smaller than differences between first-visit conversion rates, 4
  • 5. |pe 2 − pc 2| < |pe 1 − pc 1|. This means that to estimate the direction of the effect we can consider the simpler approximation d(pe − pc ) df ∼ −d(re 1 − rc 1)(pe 1 − pc 1). (17) Additionally, when return rates and second-visit conversion rates are small, we expect the sign of pe − pc to be the same as the sign of pe 1 − pc 1, so the direction of the effect is given by Sign d(pe − pc ) df = − Sign (re 1 − rc 1) Sign (pe − pc ) . (18) This means that when the return rate is larger for unconverted visitors from the experimental cell, the difference in conversion rates (whichever way it goes) is increasingly overestimated as the control cell gets bigger (f decreases). Con- versely, when the return rate is smaller for unconverted visitors from the exper- imental cell, the difference in conversion rates is increasingly underestimated as the control cell gets bigger. The effect is not likely to switch the sign of the difference in conversion rates (which would lead to qualitatively wrong results) because f, d, and |re 1 − rc 1| in Eq. 17 all must be less than 1. 3 Should we trust same-size cells? Given that our results depend on cell size, our intuition has been to trust the results of A/B tests with equal-size cells, since this seems to compare the vari- ations on more equal footing. But should we fully trust these results? That is, are the results with equal-size cells the same as what we would find in the ideal experimental design where we perfectly track all visitors’ identities? If the cells are the same size (f = 1 − f = 1/2) the difference in apparent conversion rates turns out to be pe − pc = pe 1 − pc 1 + re 1(1 − d/2)pe 2 − rc 1(1 − d/2)pc 2 + rc 1dpe 2/2 − re 1dpc 2/2 1 + (re 1 + rc 1)d/2 . (19) Comparing this to the conversion rate we would measure if we lost no identities (d = 0), (pe − pc )0 = pe 1 − pc 1 + re 1pe 2 − rc 1pc 2, (20) we find that the lost identities change the apparent difference in conversion rates by pe − pc − (pe − pc )0 = rc 1pe 2 − re 1pc 2 + (rc 1 + re 1)(rc 1pc 2 − re 1pe 2) 2/d + (re 1 + rc 1) . (21) The right side of Eq. 21 does not equal 0 in general, so even if we use equal-size cells, the difference in conversion rates that we measure is not the same as what 5
  • 6. we would measure if we could perfectly keep track of everyone’s identity. To lowest order in return rates Eq. 21 can be written pe − pc − (pe − pc )0 = d 2 (rc 1pe 2 − re 1pc 2)) + O(r2 1p2 ). (22) This effect is small whenever second-visit conversion rates are small (or whenever their particular combination with return rates in Eq. 22 is small). In general we expect that first-visit conversion rates are larger than second-visit conversion rates, so the discrepancy from having unequal cells will usually be larger than the discrepancy purely from losing visitors’ identities. As for the discrepancy from unequal cells, the direction of the latter effect can go either way. 4 Bottom line Whenever we cannot perfectly track visitors’ identities, we must take A/B tests with a grain of salt: measured conversion rates will be different from what we would measure if we could perfectly track identities. Although part of this effect—and usually the larger part—can be avoided by using same-size cells, even A/B tests with same-size cells will not in general give accurate results unless we can perfectly track visitors’ identities. 6