Natural Experiments at Scale

CODE@MIT – OCTOBER 16 2015
Lessons learned in display advertising
Natural experiments at scale
Robert Moakler – (rmoakler@stern.nyu.edu)
Ekaterina Eliseeva – (keliseeva@integralads.com)
Kiril Tsemekhman – (kiril@integralads.com)
CODE@MIT 2015

The $100+ billion question!

Does online advertising really work?
$104.57
$120.05
$140.15
$160.18
$178.45
$196.05
$213.89Digital ad spending!
% change!
2012 2013 2014 2015 2016 2017 2018!
Source: www.emarketer.com, “Global Ad Spending Growth to Double This Year”
20.4%
14.8% 16.7% 14.3% 11.4% 9.9% 9.1%

The $100+ billion question!

Does online advertising really work?

Do online ads cause you to take some action?

The usual approach!
Randomized experiments and A/B tests are great!
Campaign AdPSA

The usual approach!
But sometimes …
RIGHT
WRONG
Randomized experiments and A/B tests are great!
Campaign AdPSA

Natural experiments!
•  Consider the typical setup for the ad serving process
Confounding!
W
User
features
A
Served
ads
Y
Convert

•  Introduce a mediating variable
W
User
features
A
Served
ads
Y
Convert
M
Mediator
W’
Residual
Confounders

•  Introduce a mediating variable
–  Viewability
W
User
features
A
Served
ads
Y
Convert
V
Viewable
ad
W’
Residual
Confounders

Ad viewability!
Horizontal location (px) Proportion of ads
Verticallocation(px)
Addensity

Running in the wild!
•  Natural experiments aren’t always clean or easy
•  We will discuss ﬁve problems that we have run into and some
solutions for dealing with them

An online advertising campaign!
•  Our data structure
Analysis window
Viewable ad
Unviewable ad
Conversion
Web activity
Our users

Longitudinal data!
•  Monitoring
–  Most online advertising campaigns run continually
–  We are constantly monitoring many campaigns at the event level
•  Running an intermediary analysis
–  Data is subject to left truncation and right censoring
–  We need to account for our residual confounders, W’
•  Use survival analysis
–  Cox Proportional Hazards (CPH) model

User fragmentation and study period!
•  In reality, our users are deﬁned by cookies.
–  However, people do not just have one cookie!
Viewable ad
Unviewable ad
Conversion
Web activity
Sarah
Cookie 1 Cookie 2 Cookie 3
Bob
Cookie 1 Cookie 2
Analysis window
Cookie 3

User fragmentation and study period!
•  In reality, our users are deﬁned by cookies.
–  However, people do not just have one cookie!
•  Some methods we use to account for this
–  We deﬁne an effect period of 1 week
•  Seasonality has a major impact
•  Users are selected through iterative simulation and research
•  Incremental causal estimates level off after a single week

Validation!
•  How do we know our causal models give reasonable estimates?
•  Use an array of negative control tests
–  Use the impressions of one campaign to predict an unrelated conversion
W
User
features
A
Served
ads
Y
Convert
W’
Residual
Confounders
Y
Unrelated
Event
-
V
Viewable
ad

Running at scale!
•  Converting our data into something analyzable is a challenge
…
Raw daily logs
Billions of events
HDFS scalable
cluster storage
Hadoop
People browse
the web.
Advertising events
turn into billions of
daily events.
Raw data is moved
to scalable storage
optimized for our
experimental setup.
Users are subsampled
and negative controls
are chosen in parallel.
Reports are run in
parallel using
stripped down R
libraries.
Iterative process of
simulation and
research.

Summary!
•  Mediators and natural experiments may already exist in your data
•  Running a natural experiment at scale is not straight forward, because
1.  The longitudinal nature of the data
2.  Users can become highly fragmented
3.  No predetermined start and end dates
4.  Validation of causal models
5.  Billions of events and terabytes of raw data
•  Equal parts engineering and modeling
•  We explored online advertising, but this setup can apply to a wide
variety of industries

Thanks!
Robert Moakler – (rmoakler@stern.nyu.edu)
Ekaterina Eliseeva – (keliseeva@integralads.com)
Kiril Tsemekhman – (kiril@integralads.com)

Grab this deck @ bit.ly/natural-experiments-at-scale

Acknowledgments!
Integral Ad Science
Ekaterina Eliseeva
Kiril Tsemekhman
Ana Calabrese
Gijs Joost Brouwer
Sergei Izrailev

NYU Stern
Foster Provost

Amazon, Inc.
Daniel Hill

References!
Chan, D., Ge, R., Gershony, O., Hesterberg, T., & Lambert, D. (2010, July). Evaluating online ad
campaigns in a pipeline: causal models at scale. In Proceedings of the 16th ACM SIGKDD
international conference on Knowledge discovery and data mining (pp. 7-16). ACM.
Dalessandro, B., Perlich, C., Stitelman, O., & Provost, F. (2012, August). Causally motivated attribution
for online advertising. In Proceedings of the Sixth International Workshop on Data Mining for Online
Advertising and Internet Economy (p. 7). ACM.
Hill, D. N., Moakler, R., Hubbard, A. E., Tsemekhman, V., Provost, F., & Tsemekhman, K. (2015, August).
Measuring Causal Impact of Online Actions via Natural Experiments: Application to Display
Advertising. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (pp. 1839-1847). ACM.
Johnson, G. A., Lewis, R. A., Nubbemeyer, E. I. (2015, October). Ghost Ads: Improving the Economics
of Measuring Ad Effectiveness. Available on SSRN: ssrn.com/abstract=2620078
Klein, J. P., & Moeschberger, M. L. (2003). Survival analysis: techniques for censored and truncated
data. Springer Science & Business Media.
Pearl, J. (2009). Causality. Cambridge university press.

Natural Experiments at Scale

Recommandé

Recommandé

Contenu connexe

Similaire à Natural Experiments at Scale

Similaire à Natural Experiments at Scale (20)

Plus de Integral Ad Science

Plus de Integral Ad Science (8)

Dernier

Dernier (20)

Natural Experiments at Scale