SlideShare une entreprise Scribd logo
1  sur  36
of causal inference
Glass of
wine a day
Health
Income
w Y
X
Experiments you thought were good can still be invalid
Experiments you thought were bad can still be valid
Randomized testing: the set-up
Sample is randomly
split into two groups
Random subsample of
population is chosen
POPULATION INTERVENTION
CONTROL
= no change = improved outcome
Outcome in both
groups is measured
The same for all
participants
AVERAGE
TREATMENT
EFFECT
USE CASE: heat pump savings @ Eneco
?
Measurement data: daily gas usage ~ outside temperature
Average outside temperature (°C)
Gasusage(m3)
The experiment in the randomized test framework
• Sample is based on
“friendly users”: Eneco
employees, early
adopters and energy
enthusiasts
• Rental homes are excluded
from the study
• Participation is initiated by
customer
• Outcome: average yearly gas
savings
• Placements over many months
• Changes made to intervention
halfway through study
AVERAGE
TREATMENT
EFFECT
INTERVENTION
CONTROL
Fixing group imbalance: match test and control
Available covariates:
• House size (m2)
• Building type (terraced, apartment, detached, semi-detached)
• Construction period (<1946, 1946-1965, …, > 2010)
• Number of inhabitants (1, 2, 3, 4, 5+)
Number of possibilities: 10 x 4 x 6 x 5 = 1200
Our sample population is only 2500, exact matches infeasible  partial matching
Propensity Score Matching
Propensity score matching – concept
38%
Calculate chance of receiving treatment
given X (house type, etc)
test A
83%39%
41%
Match test subject to k control subjects
on this probability
12% 22%
Calculate effect for test and (matched) control
-
500m3
-20m3average
-
480m3
Repeat for all participants
 average effect over test group
RUN
AWAY!
Recap heat pump use case
• Experiment fails (almost) all standard assumptions
• Each of the “faults” can be corrected
• Measure months, need year  extrapolate with model
• Bias in test group  match with equally biased control using propensity
• Outcome: average effect over test group, not whole population
• We can not say anything about rental households without making additional
assumptions
USE CASE: effect of cooler placement @ HEINEKEN
?
€ €
USE CASE: effect of cooler placement @ HEINEKEN
POPULATION
• 13K off-trade* outlets
• Selling HEINEKEN beer brands
• May receive cooler
* Small to medium shops, e.g. mom and pop shops, groceries and kiosks; not retail
• Pool for ’experiment’ is
all outlets, sample is the
population
• Observational approach:
coolers are already placed
• Gold outlets higher
probability of getting cooler
than others
• Need effect on individual
outlets, to prioritize
future placements
AVERAGE
TREATMENT
EFFECT
INTERVENTION
CONTROL
The same for all
participants
• Outcome: yearly profit** uplift
• Placements over many years,
movements not tracked 
sales before/after unknown
** Profit is measured as FGP/hl, a company-wide calculation of profit per hl sales
Fig. Histograms showing the distribution of total profit per
outlet, when broken down by ranking and cooler setup
Problem 1: test and control group are statistically
different
Distribution of relevant characteristics* is different between test and control
profit
* A relevant characteristic is one that influences the probability of being selected for treatment
Problem 1: test and control group are statistically
different
Distribution of relevant characteristics* is different between test and control
* A relevant characteristic is one that influences the probability of being selected for treatment
• Outlet ranking (gold, silver, bronze)
• Outlet sub-channel (kiosk, grocery, convenience, etc)
• Outlet area type (city, urban, village)
• Area (name of neighborhood)
• Seasonality (is outlet only open in summer)
• Sales rep visits per month
• Volume of competitor vs HEINEKEN sales
• Number of assortment deals with HEINEKEN
• Amount of investment by HEINEKEN
• Number of HEINEKEN branding materials
• Census demographics in km2 (population, age, gender)
• Google Maps metrics in 500m2 (average venue rating, # venues
with photo, # of unique venue types, average venue opening times)
data_nongold = pd.DataFrame({
'y_profit': 20 + 5*np.random.randn(n),
'X_gold': 0,
'w_cooler': np.random.choice([0, 1], size=(n,), p=[2./3, 1./3])
}).assign(y_profit=lambda df: np.where(df.w_cooler, df.y_profit + 3, df.y_profit))
data_gold = pd.DataFrame({
'y_profit': 25 + 5*np.random.randn(n),
'X_gold': 1,
'w_cooler': np.random.choice([0, 1], size=(n,), p=[1./3, 2./3])
}).assign(y_profit=lambda df: np.where(df.w_cooler, df.y_profit + 5, df.y_profit))
data = data_nongold.append(data_gold
The need for effect correction – staging an experiment
Definition: conditional mean
Mean of y for given values of X, i.e. average of one variable as
a function of some other variables
𝐸 𝑌 𝑋 = 𝑋𝛽
Effect = mean treated – mean untreated
𝐸 𝑌 𝑤 = 1 − 𝐸 𝑌 𝑤 = 0 = 27.70 − 21.66 = 6.04 ??
The need for effect correction – staging an experiment
𝐴𝑇𝐸𝑖𝑛𝑠 = 𝐸 𝑌 𝑋 = 1, 𝑤 = 1 − 𝐸 𝑌 𝑋 = 1, 𝑤 = 0
= 30.07 − 24.90 = 5.17
𝐴𝑇𝐸 𝑛𝑜𝑛𝑖𝑛𝑠 = 𝐸 𝑌 𝑋 = 0, 𝑤 = 1 − 𝐸 𝑌 𝑋 = 0, 𝑤 = 0
= 20.00 − 22.96 = 2.96
Only gold
Effect = mean treated – mean untreated
Only non-gold
Effect = mean treated – mean untreated
The need for effect correction – staging an experiment
What would be the effect if all the imbalance in treatment
caused by gold ranking is removed?
50% of outlets are gold, if the probability of placement
were equal for all of them, the effect would be ...
𝐴𝑇𝐸 = 𝐸 𝑌 𝑋, 𝑤 = 1 − 𝐸 𝑌 𝑋, 𝑤 = 0
= 4.06
The need for effect correction – staging an experiment
Procedure
With the sample mean of the covariates, fit the
regression
And the coefficient on w will be the average treatment
effect
𝑌 𝑜𝑛 1, 𝑤, 𝑿, 𝑤(𝑿 − 𝑿)
𝑿
data_reg = data.assign(
demeaned_interaction=lambda df:
df.w_cooler * (df.X_gold - df.X_gold.mean())
)
lm_all = LinearRegression()
lm_all.fit(
data_reg[['X_gold', 'demeaned_interaction', 'w_cooler']],
data.y_profit
)
lm_all.coef_[2]
4.0637
Estimating the ATE with regression – assumptions
Conditional mean independence
Mean dependence between treatment assignment w and
treatment-specific outcomes Yi can be removed by conditioning
on some variables X, provided that they are observable (AKA
weak ignorability)
𝐸 𝑌𝑖 𝑋, 𝑤 = 𝐸 𝑌𝑖 𝑋 𝑓𝑜𝑟 𝑖 ∈ {0,1}
Individual treatment effect estimation – assumptions
Many approaches exist, but most of your bias will be due to not observing enough confounders
X!
Conditional independence
Any dependence between treatment assignment w and
treatment-specific outcomes Yi can be removed by conditioning
on some variables X, provided that they are observable (AKA
strong ignorability)
𝑌0, 𝑌1 ⫫ 𝑤|𝑿
Estimating ITE with Virtual Twins*
Sales
Rating
=Bronze/Silver
Rating
=Gold
Cooler
=0
Cooler
=1
€2000 €3000
Procedure
Fit a tree ensemble with target Y and features X, w,
and interactions** between X and w
Predict all units with w=1 , predict all units with w=0
Subtract to get
Early stopping and OOB predictions reduce
overfitting, quantile objective can help to trim outliers
𝜏𝑖𝑡𝑒, 𝑖 = 𝑚1 𝑿𝑖 − 𝑚0 𝑿𝑖
* Foster, J. C., Taylor, J. M., and Ruberg, S. J. (2011). Subgroup identification from randomized clinical trial
data. Statistics in Medicine, 30(24):2867–2880.
** Scaling like we did with the linear ATE estimator is generally not needed with tree-based estimators
Fig. Model predicted profit versus actual profit, by
cooler type (all outlets)
USE CASE: effect of cooler placement @ HEINEKEN
Overview
USE CASE: effect of cooler placement @ HEINEKEN
Coolers to consider
Fig. Model predicted profit versus actual profit, by
cooler type (outlets within 90% confidence interval)
USE CASE: effect of cooler placement @ HEINEKEN
Coolers to upgrade
Fig. Model predicted profit versus actual profit, by
cooler type (outlets to upgrade / install)
USE CASE: effect of cooler placement @ HEINEKEN
Coolers to upgrade
Fig. Model predicted profit versus actual profit, by
cooler type (outlets to upgrade / install)
USE CASE: effect of cooler placement @ HEINEKEN
Coolers to upgrade
Fig. Model predicted profit versus actual profit, by
cooler type (outlets to upgrade / install)
• Your perfect experiment is likely ruined by harsh
reality
• But you may be able to fix it:
• Propensity score matching
• Average and individual treatment effect estimation
• Make sure you collect enough data:
• When is the treatment done?
• Measure Y before and after experiment
• What covariates X influence both treatment w and outcome Y?
Looking for:
• Senior Data Scientist
• Senior Data Engineer
Contact: ciaran.jetten@heineken.com
Estimating ITE with Honest RF*
* Athey, S., & Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the
National Academy of Sciences, 113(27), 7353-7360.
Cooler 1/0
Rating
=Bronze/Silver
Rating
=Gold
𝐸 𝑌 𝑤 = 1 − 𝐸 𝑌 𝑤 = 0
€2000 − €3000 = €1000
Procedure
Fit a tree ensemble with target w and features X, with
constraint of minimum k units per class in each DT
leaf
Per leaf K in each DT, calculate mean difference in Y
between treatment and control units to get
𝜏𝑖𝑡𝑒, 𝑖 = 𝑁−1
𝑗=1
𝑁
[𝑌𝑗1 − 𝑌𝑗0]
𝑓𝑜𝑟 𝑖 ∈ 𝐾 𝑎𝑛𝑑 𝑗 ∈ 𝐾
Estimating ITE using Counterfactual Regression*
* Shalit, U., Johansson, F., & Sontag, D. (2016). Estimating individual treatment effect: generalization bounds
and algorithms. arXiv preprint arXiv:1606.03976.
Procedure
Learn a representation Φ of X  split samples
according to w  regress Y0 and Y1 on the
representation separately
Regularize Φ using IPM, which is the distance
between the distribution of X in w=1 and of X in w=0
Thus having joint objective of minimizing predictive
error and guaranteeing a balanced representation of
X

Contenu connexe

Tendances

Customer retention measurement.
Customer retention measurement.Customer retention measurement.
Customer retention measurement.
Erika G. G.
 
Basic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-MakingBasic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-Making
Penn State University
 

Tendances (20)

Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
 
Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)
 
Prediction of potential customers for term deposit
Prediction of potential customers for term depositPrediction of potential customers for term deposit
Prediction of potential customers for term deposit
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)
 
Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)
 
Customer retention measurement.
Customer retention measurement.Customer retention measurement.
Customer retention measurement.
 
Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)
 
How to crack down big data?
How to crack down big data? How to crack down big data?
How to crack down big data?
 
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
 
Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
 
Evaluation method for strategic investments
Evaluation method for strategic investmentsEvaluation method for strategic investments
Evaluation method for strategic investments
 
Ch. 4-demand-estimation(2)
Ch. 4-demand-estimation(2)Ch. 4-demand-estimation(2)
Ch. 4-demand-estimation(2)
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous Optimization
 
1120 track2 bennett
1120 track2 bennett1120 track2 bennett
1120 track2 bennett
 
Basic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-MakingBasic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-Making
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)
 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 

Similaire à Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink

Lecture 1: NBERMetrics
Lecture 1: NBERMetricsLecture 1: NBERMetrics
Lecture 1: NBERMetrics
NBER
 
multiple.ppt
multiple.pptmultiple.ppt
multiple.ppt
jldee1
 

Similaire à Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink (20)

PRESENTATION ON BUSINESS_MATHEMATICS.pptx
PRESENTATION ON BUSINESS_MATHEMATICS.pptxPRESENTATION ON BUSINESS_MATHEMATICS.pptx
PRESENTATION ON BUSINESS_MATHEMATICS.pptx
 
Implementing and analyzing online experiments
Implementing and analyzing online experimentsImplementing and analyzing online experiments
Implementing and analyzing online experiments
 
Lecture-30-Optimization.pptx
Lecture-30-Optimization.pptxLecture-30-Optimization.pptx
Lecture-30-Optimization.pptx
 
Churn Modeling For Mobile Telecommunications
Churn Modeling For Mobile TelecommunicationsChurn Modeling For Mobile Telecommunications
Churn Modeling For Mobile Telecommunications
 
Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis
 
205250 crystall ball
205250 crystall ball205250 crystall ball
205250 crystall ball
 
Lecture 1: NBERMetrics
Lecture 1: NBERMetricsLecture 1: NBERMetrics
Lecture 1: NBERMetrics
 
Quantitative Analysis for Emperical Research
Quantitative Analysis for Emperical ResearchQuantitative Analysis for Emperical Research
Quantitative Analysis for Emperical Research
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
How to use statistica for rsm study
How to use statistica for rsm studyHow to use statistica for rsm study
How to use statistica for rsm study
 
Data Envelopment Analysis
Data Envelopment AnalysisData Envelopment Analysis
Data Envelopment Analysis
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Operations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paperOperations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paper
 
PPT ON TAGUCHI METHODS / TECHNIQUES - KAUSTUBH BABREKAR
PPT ON TAGUCHI METHODS / TECHNIQUES - KAUSTUBH BABREKARPPT ON TAGUCHI METHODS / TECHNIQUES - KAUSTUBH BABREKAR
PPT ON TAGUCHI METHODS / TECHNIQUES - KAUSTUBH BABREKAR
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Topic 2 economic optimization
Topic 2 economic optimizationTopic 2 economic optimization
Topic 2 economic optimization
 
multiple.ppt
multiple.pptmultiple.ppt
multiple.ppt
 
multiple.ppt
multiple.pptmultiple.ppt
multiple.ppt
 

Plus de PyData

Plus de PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink

  • 2.
  • 3. Glass of wine a day Health Income
  • 5. Experiments you thought were good can still be invalid Experiments you thought were bad can still be valid
  • 6. Randomized testing: the set-up Sample is randomly split into two groups Random subsample of population is chosen POPULATION INTERVENTION CONTROL = no change = improved outcome Outcome in both groups is measured The same for all participants AVERAGE TREATMENT EFFECT
  • 7. USE CASE: heat pump savings @ Eneco ?
  • 8. Measurement data: daily gas usage ~ outside temperature Average outside temperature (°C) Gasusage(m3)
  • 9. The experiment in the randomized test framework • Sample is based on “friendly users”: Eneco employees, early adopters and energy enthusiasts • Rental homes are excluded from the study • Participation is initiated by customer • Outcome: average yearly gas savings • Placements over many months • Changes made to intervention halfway through study AVERAGE TREATMENT EFFECT INTERVENTION CONTROL
  • 10.
  • 11. Fixing group imbalance: match test and control Available covariates: • House size (m2) • Building type (terraced, apartment, detached, semi-detached) • Construction period (<1946, 1946-1965, …, > 2010) • Number of inhabitants (1, 2, 3, 4, 5+) Number of possibilities: 10 x 4 x 6 x 5 = 1200 Our sample population is only 2500, exact matches infeasible  partial matching Propensity Score Matching
  • 12. Propensity score matching – concept 38% Calculate chance of receiving treatment given X (house type, etc) test A 83%39% 41% Match test subject to k control subjects on this probability 12% 22% Calculate effect for test and (matched) control - 500m3 -20m3average - 480m3 Repeat for all participants  average effect over test group RUN AWAY!
  • 13. Recap heat pump use case • Experiment fails (almost) all standard assumptions • Each of the “faults” can be corrected • Measure months, need year  extrapolate with model • Bias in test group  match with equally biased control using propensity • Outcome: average effect over test group, not whole population • We can not say anything about rental households without making additional assumptions
  • 14. USE CASE: effect of cooler placement @ HEINEKEN ? € €
  • 15. USE CASE: effect of cooler placement @ HEINEKEN POPULATION • 13K off-trade* outlets • Selling HEINEKEN beer brands • May receive cooler * Small to medium shops, e.g. mom and pop shops, groceries and kiosks; not retail • Pool for ’experiment’ is all outlets, sample is the population • Observational approach: coolers are already placed • Gold outlets higher probability of getting cooler than others • Need effect on individual outlets, to prioritize future placements AVERAGE TREATMENT EFFECT INTERVENTION CONTROL The same for all participants • Outcome: yearly profit** uplift • Placements over many years, movements not tracked  sales before/after unknown ** Profit is measured as FGP/hl, a company-wide calculation of profit per hl sales
  • 16.
  • 17. Fig. Histograms showing the distribution of total profit per outlet, when broken down by ranking and cooler setup Problem 1: test and control group are statistically different Distribution of relevant characteristics* is different between test and control profit * A relevant characteristic is one that influences the probability of being selected for treatment
  • 18. Problem 1: test and control group are statistically different Distribution of relevant characteristics* is different between test and control * A relevant characteristic is one that influences the probability of being selected for treatment • Outlet ranking (gold, silver, bronze) • Outlet sub-channel (kiosk, grocery, convenience, etc) • Outlet area type (city, urban, village) • Area (name of neighborhood) • Seasonality (is outlet only open in summer) • Sales rep visits per month • Volume of competitor vs HEINEKEN sales • Number of assortment deals with HEINEKEN • Amount of investment by HEINEKEN • Number of HEINEKEN branding materials • Census demographics in km2 (population, age, gender) • Google Maps metrics in 500m2 (average venue rating, # venues with photo, # of unique venue types, average venue opening times)
  • 19. data_nongold = pd.DataFrame({ 'y_profit': 20 + 5*np.random.randn(n), 'X_gold': 0, 'w_cooler': np.random.choice([0, 1], size=(n,), p=[2./3, 1./3]) }).assign(y_profit=lambda df: np.where(df.w_cooler, df.y_profit + 3, df.y_profit)) data_gold = pd.DataFrame({ 'y_profit': 25 + 5*np.random.randn(n), 'X_gold': 1, 'w_cooler': np.random.choice([0, 1], size=(n,), p=[1./3, 2./3]) }).assign(y_profit=lambda df: np.where(df.w_cooler, df.y_profit + 5, df.y_profit)) data = data_nongold.append(data_gold
  • 20. The need for effect correction – staging an experiment Definition: conditional mean Mean of y for given values of X, i.e. average of one variable as a function of some other variables 𝐸 𝑌 𝑋 = 𝑋𝛽 Effect = mean treated – mean untreated 𝐸 𝑌 𝑤 = 1 − 𝐸 𝑌 𝑤 = 0 = 27.70 − 21.66 = 6.04 ??
  • 21. The need for effect correction – staging an experiment 𝐴𝑇𝐸𝑖𝑛𝑠 = 𝐸 𝑌 𝑋 = 1, 𝑤 = 1 − 𝐸 𝑌 𝑋 = 1, 𝑤 = 0 = 30.07 − 24.90 = 5.17 𝐴𝑇𝐸 𝑛𝑜𝑛𝑖𝑛𝑠 = 𝐸 𝑌 𝑋 = 0, 𝑤 = 1 − 𝐸 𝑌 𝑋 = 0, 𝑤 = 0 = 20.00 − 22.96 = 2.96 Only gold Effect = mean treated – mean untreated Only non-gold Effect = mean treated – mean untreated
  • 22. The need for effect correction – staging an experiment What would be the effect if all the imbalance in treatment caused by gold ranking is removed? 50% of outlets are gold, if the probability of placement were equal for all of them, the effect would be ... 𝐴𝑇𝐸 = 𝐸 𝑌 𝑋, 𝑤 = 1 − 𝐸 𝑌 𝑋, 𝑤 = 0 = 4.06
  • 23. The need for effect correction – staging an experiment Procedure With the sample mean of the covariates, fit the regression And the coefficient on w will be the average treatment effect 𝑌 𝑜𝑛 1, 𝑤, 𝑿, 𝑤(𝑿 − 𝑿) 𝑿
  • 24. data_reg = data.assign( demeaned_interaction=lambda df: df.w_cooler * (df.X_gold - df.X_gold.mean()) ) lm_all = LinearRegression() lm_all.fit( data_reg[['X_gold', 'demeaned_interaction', 'w_cooler']], data.y_profit ) lm_all.coef_[2] 4.0637
  • 25. Estimating the ATE with regression – assumptions Conditional mean independence Mean dependence between treatment assignment w and treatment-specific outcomes Yi can be removed by conditioning on some variables X, provided that they are observable (AKA weak ignorability) 𝐸 𝑌𝑖 𝑋, 𝑤 = 𝐸 𝑌𝑖 𝑋 𝑓𝑜𝑟 𝑖 ∈ {0,1}
  • 26. Individual treatment effect estimation – assumptions Many approaches exist, but most of your bias will be due to not observing enough confounders X! Conditional independence Any dependence between treatment assignment w and treatment-specific outcomes Yi can be removed by conditioning on some variables X, provided that they are observable (AKA strong ignorability) 𝑌0, 𝑌1 ⫫ 𝑤|𝑿
  • 27. Estimating ITE with Virtual Twins* Sales Rating =Bronze/Silver Rating =Gold Cooler =0 Cooler =1 €2000 €3000 Procedure Fit a tree ensemble with target Y and features X, w, and interactions** between X and w Predict all units with w=1 , predict all units with w=0 Subtract to get Early stopping and OOB predictions reduce overfitting, quantile objective can help to trim outliers 𝜏𝑖𝑡𝑒, 𝑖 = 𝑚1 𝑿𝑖 − 𝑚0 𝑿𝑖 * Foster, J. C., Taylor, J. M., and Ruberg, S. J. (2011). Subgroup identification from randomized clinical trial data. Statistics in Medicine, 30(24):2867–2880. ** Scaling like we did with the linear ATE estimator is generally not needed with tree-based estimators
  • 28. Fig. Model predicted profit versus actual profit, by cooler type (all outlets) USE CASE: effect of cooler placement @ HEINEKEN Overview
  • 29. USE CASE: effect of cooler placement @ HEINEKEN Coolers to consider Fig. Model predicted profit versus actual profit, by cooler type (outlets within 90% confidence interval)
  • 30. USE CASE: effect of cooler placement @ HEINEKEN Coolers to upgrade Fig. Model predicted profit versus actual profit, by cooler type (outlets to upgrade / install)
  • 31. USE CASE: effect of cooler placement @ HEINEKEN Coolers to upgrade Fig. Model predicted profit versus actual profit, by cooler type (outlets to upgrade / install)
  • 32. USE CASE: effect of cooler placement @ HEINEKEN Coolers to upgrade Fig. Model predicted profit versus actual profit, by cooler type (outlets to upgrade / install)
  • 33. • Your perfect experiment is likely ruined by harsh reality • But you may be able to fix it: • Propensity score matching • Average and individual treatment effect estimation • Make sure you collect enough data: • When is the treatment done? • Measure Y before and after experiment • What covariates X influence both treatment w and outcome Y?
  • 34. Looking for: • Senior Data Scientist • Senior Data Engineer Contact: ciaran.jetten@heineken.com
  • 35. Estimating ITE with Honest RF* * Athey, S., & Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27), 7353-7360. Cooler 1/0 Rating =Bronze/Silver Rating =Gold 𝐸 𝑌 𝑤 = 1 − 𝐸 𝑌 𝑤 = 0 €2000 − €3000 = €1000 Procedure Fit a tree ensemble with target w and features X, with constraint of minimum k units per class in each DT leaf Per leaf K in each DT, calculate mean difference in Y between treatment and control units to get 𝜏𝑖𝑡𝑒, 𝑖 = 𝑁−1 𝑗=1 𝑁 [𝑌𝑗1 − 𝑌𝑗0] 𝑓𝑜𝑟 𝑖 ∈ 𝐾 𝑎𝑛𝑑 𝑗 ∈ 𝐾
  • 36. Estimating ITE using Counterfactual Regression* * Shalit, U., Johansson, F., & Sontag, D. (2016). Estimating individual treatment effect: generalization bounds and algorithms. arXiv preprint arXiv:1606.03976. Procedure Learn a representation Φ of X  split samples according to w  regress Y0 and Y1 on the representation separately Regularize Φ using IPM, which is the distance between the distribution of X in w=1 and of X in w=0 Thus having joint objective of minimizing predictive error and guaranteeing a balanced representation of X

Notes de l'éditeur

  1. ”HeatWinner”  hybrid heat pump, installed alongside boiler Takes over part of the heating demand from the boiler save on gas Goal of the pilot / experiment: calculate average gas savings
  2. conditional mean. The conditional mean expresses the average of one variable as a function of some other variables. More formally, the mean of y conditional on x is the mean of y for given values of x; in other words, it is E(y|x). conditional-independence assumption. The conditional-independence assumption requires that the common variables that affect treatment assignment and treatment-specific outcomes be observable. The dependence between treatment assignment and treatment-specific outcomes can be removed by conditioning on these observable variables. Conditional independence (strong ignorability): This says that the distribution of the potential outcomes, (y 0 , y1 ), is the same across levels of the treatment variable, T , once we condition on confounding covariates X Conditional mean independence (weak ignorability): This says that the mean of the potential outcomes, (y 0 , y1 ), is the same across levels of the treatment variable, T , once we condition on confounding covariates X
  3. Experiment fails (almost) all standard assumptions Each of the “faults” can be corrected Measure months, need year  extrapolate with model Bias in test group  match with equally biased control using propensity Outcome: average effect over test group If you want the effect over the entire population, more corrections are needed Since all rental houses are dropped from the experiment, we can not say anything about rental households without making additional assumptions
  4. ”HeatWinner”  hybrid heat pump, installed alongside boiler Takes over part of the heating demand from the boiler save on gas Goal of the pilot / experiment: calculate average gas savings
  5. conditional mean. The conditional mean expresses the average of one variable as a function of some other variables. More formally, the mean of y conditional on x is the mean of y for given values of x; in other words, it is E(y|x). conditional-independence assumption. The conditional-independence assumption requires that the common variables that affect treatment assignment and treatment-specific outcomes be observable. The dependence between treatment assignment and treatment-specific outcomes can be removed by conditioning on these observable variables. Conditional independence (strong ignorability): This says that the distribution of the potential outcomes, (y 0 , y1 ), is the same across levels of the treatment variable, T , once we condition on confounding covariates X Conditional mean independence (weak ignorability): This says that the mean of the potential outcomes, (y 0 , y1 ), is the same across levels of the treatment variable, T , once we condition on confounding covariates X
  6. conditional mean. The conditional mean expresses the average of one variable as a function of some other variables. More formally, the mean of y conditional on x is the mean of y for given values of x; in other words, it is E(y|x). conditional-independence assumption. The conditional-independence assumption requires that the common variables that affect treatment assignment and treatment-specific outcomes be observable. The dependence between treatment assignment and treatment-specific outcomes can be removed by conditioning on these observable variables. Conditional independence (strong ignorability): This says that the distribution of the potential outcomes, (y 0 , y1 ), is the same across levels of the treatment variable, T , once we condition on confounding covariates X Conditional mean independence (weak ignorability): This says that the mean of the potential outcomes, (y 0 , y1 ), is the same across levels of the treatment variable, T , once we condition on confounding covariates X
  7. Decision trees consecutively slice feature space into leaves with minimal target variance Tree ensembles (Random Forest, Gradient Boosting) improve generalization to new data Suitable for making predictions on individual units
  8. By estimating classification trees on the treatment, effectively matches units on propensity score When setting minimum of k units of each class per leaf, E(Y1 – Y0) can be calculated locally
  9. Custom neural network architectures can constrain how X is distributed over treatment and control Experimental results very strong, especially on the IHDP synthetic dataset