This document discusses using matching as an alternative to A/B testing for analyzing the effects of changes in games. It notes that in games, players self-select into options that maximize their fun rather than being randomly assigned. The document then explains the assumptions and methods of matching, including different matching algorithms that can be used to match treated and untreated players based on observable characteristics. It presents results of using matching to analyze the effect of "zeropayments" in a game, which guide players through a fake payment process. Finally, it provides references for further reading on matching methods.
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Ubisoft
1. Matching as an Alternative to A/B Testing
Christoph Safferling
Head of Game Analytics
Ubisoft Blue Byte
Games Industry Analytics Forum
May 9th, 2013
2. Self-selection in games
in games, we routinely change things, and want to test if the
change was successful
game changes: quest changes, introduce new items, etc
shop configurations: amount of items, allocation, prices, etc
...and many examples more!
players self-select into the group that maximises their utility
(fun)
most game variables are the results of a player’s decision:
exogeneity is (usually) not given: E[ε|X] = 0
3. Treatment effects
test the outcome of a treatment effect
E[Y|X, D = 1] − E[Y|X, D = 0] = E[Y(1) − Y(0)|X]
with Y as the outcome, X as the observable data, and D as
the treatment dummy
we are intested in the average treatment effect on the treated:
ATT = E[Y(1) − Y(0)|D = 1]
= E[Y(1)|D = 1] − E[Y(0)|D = 1]
4. E[Y(0)|D = 1] is a counterfactual: unobservable
proper control groups (A/B testing!) provides a consistent
estimator
sometimes, A/B testing is not available/feasible
(one) different econometric modeling strategy: matching
estimator
reproduce the treatment group among the non-treated:
find individuals who differ only in their outcomes, and their
treatment effect (“statistical twins”)
5. Assumptions and problems
Conditional Independence Assumption: given X, we assume
the outcome Y to be independent of the treatment D.
→ conditional on observed characteristics, selection bias is
removed
Common Support is given: 0 < P(D = 1|X) < 1
→ we exclude unmatched observations
Curse of Dimensionality: increasing X improves the matching
quality, but makes matching more difficult!
→ e.g. for continuous variables: P(X1 = x) = 0
8. Zeropayments in TSO Russia
payment conversion in TSO RU was low
one explanation: payment process “scary”
“zeropayments” guide the player through the payment
process, offering a small reward for completing a fake
payment
9. Results of the treatment
reference: lifetime pay-to-active TSO RU a
paid at least once additionally to the zeropayment 5.9a
paid after their zeropayment 3.5a
paid after their zeropayment, not paid before 1.6a
10. Matching results (tobit)
(1) (2) (5) (6)
tobit full tobit2 full tobit cem tobit2 cem
had zero payments 7.376 19.71 -356.3 -350.1
(0.974) (0.931) (0.270) (0.276)
level 315.3∗∗ 354.1∗∗ 674.4 696.4
(0.007) (0.000) (0.177) (0.179)
level squared -0.796 -1.441 -9.274 -9.635
(0.709) (0.416) (0.291) (0.289)
uniqueLogins -26.27∗∗ -28.22∗∗ -33.35 -34.78
(0.018) (0.007) (0.199) (0.204)
rating for week -407.0† -400.7† 39.74 42.50
(0.076) (0.076) (0.915) (0.908)
guild 647.9∗∗ 651.2∗∗ 639.6 627.8
(0.012) (0.011) (0.388) (0.400)
age 53.18∗∗ 52.37∗∗ 185.4 171.8
(0.024) (0.025) (0.264) (0.288)
(additional controls, including intercept)
N 12376 19522 4114 6894
pseudo R2 0.162 0.189 0.139 0.158
p-values in parentheses
11. Matching results (zero-inflated negbin)
(1) (2) (5) (6)
zinb full zinb2 full zinb cem zinb2 cem
had zero payments 0.111 0.110 0.540∗∗ 0.538∗∗
(0.463) (0.466) (0.005) (0.006)
level 0.148∗∗ 0.150∗∗ -0.153 -0.255†
(0.012) (0.010) (0.332) (0.096)
level squared -0.00211∗∗ -0.00213∗∗ 0.00429 0.00617∗∗
(0.036) (0.032) (0.155) (0.035)
uniqueLogins -0.0180∗∗ -0.0180∗∗ -0.0308∗∗ -0.0310∗∗
(0.007) (0.006) (0.005) (0.005)
rating for week 0.747∗∗ 0.748∗∗ 1.662∗∗ 1.653∗∗
(0.000) (0.000) (0.000) (0.000)
guild -0.112 -0.112 0.280 0.297
(0.319) (0.319) (0.286) (0.264)
age 0.0383∗∗ 0.0383∗∗ 0.119 0.192†
(0.012) (0.012) (0.308) (0.096)
(additional controls, including intercept and inflate regression)
N 12376 19522 4114 6894
p-values in parentheses
12. further reading
Rosenbaum, P. R., Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal
effects. Biometrika 70 (1), pp. 41-55.
Heckman, J. J., H. Ichimura, and P. Todd (1997). Matching as an Econometric Evaluation Estimator: Evidence
From Evaluating a Job Training Programme. Review of Economic Studies 64, pp. 605-54.
Angrist, J. D. and A. B. Krueger (1999). Empirical Strategies in Labor Economics. pp. 1277-1366 in Handbook of
Labor Economics, vol. 3, edited by O. C. Ashenfelter and D. Card. Amsterdam: Elsevier.
Blackwell, M., Iacus, S., King, G., Porro, G., (2009). cem: Coarsened exact matching in stata. Stata Journal 9 (4),
pp. 524-546.
Iacus, S., King, G., Porro, G. (June 2008). Matching for causal inference without balance checking. UNIMI –
Research Papers in Economics, Business, and Statistics 1073, Universit´a degli Studi di Milano.
Lechner M. (2002). Some practical issues in the evaluation of heterogeneous labour market programmes by matching
methods. Journal of the Royal Statistical Society. Series A, 165, pp. 59-82.
Leuven, E., Sianesi, B. (April 2003). Psmatch2: Stata module to perform full mahalanobis and propensity score
matching, common support graphing, and covariate imbalance testing. S432001 Statistical Software Components,
Boston College Department of Economics