SlideShare une entreprise Scribd logo
1  sur  23
DoWhy: An end-to-end
library for causal inference
Amit Sharma (@amt_shrma), Emre Kiciman (@emrek)
Microsoft Research
A big thanks to Adam Kelleher, Tanmay Kulkarni and many other open-
source contributors!
https://github.com/microsoft/dowhy
Prediction Causation
Assume:
𝑃𝑡𝑟𝑎𝑖𝑛 𝑊, 𝑋, 𝑌 = 𝑃𝑡𝑒𝑠𝑡(𝑊, 𝑋, 𝑌)
Estimate: min 𝐿( 𝑦, 𝑦)
Evaluate: Cross-validation
Fundamental problem with causal inference
• Causal inference concerns estimation about different data
distributions than the training distribution
• What if 𝑥 is changed to different value?
• How do the results change for a different sample of people?
• What if a particular algorithm is changed in a system?
• Often, no data is available for that distribution
• Cross-validation is not possible
• 𝛽is not observed, unlike 𝑦.
• 𝑦 is observed for training domain, but not for a new domain.
Estimation about different data
distributions than the training distribution.
Often, no data is available for that
distribution.
1. Assumptions
2. Evaluation
1. Assumptions drive causal inference
• Causal inference methods depend on untestable assumptions.
• Even with large-scale data, the final estimate can be heavily sensitive to
those assumptions.
• Important to transparently communicate those assumptions.
Item 1
Demand for
items
Item 2
Recommendation
System
2. Causal estimates are hard to validate
• Cannot compare two causal
estimates on the same dataset
• Q: Is algorithm B better than the
production algorithm A?
• Cannot tell without doing an A/B
test.
• Let alone compare two estimates
from two different datasets
• Everyone prefers their own
favorite methods
• Need objective metrics to
validate causal estimates
The effect of online advertising on sales is
20% (std error=5)
What assumptions went
in the analysis?
How would it change if
one of the assumptions
was incorrect?
Is it robust to seasonal
shifts in behavior?
What is the expected
error in this estimate?
We built DoWhy to make assumptions front-and-
center of any causal analysis.
- Transparent declaration of assumptions
- Evaluation of those assumptions, to the extent possible
An end-to-end platform for doing causal inference
Formulate correct
estimand
Estimate causal
effect
Check robustness
Input Data
<cause, outcome,
other variables>
Domain Knowledge
Causal
effect
CausalImpact, tmle,
causaleffect,…
Formulate correct
estimand
• Check
data with
properties
implied
by the
model
Estimate causal
effect
• Use a
suitable
method
to
estimate
effect.
Check robustness
• Refute
obtained
estimate
through
multiple
tests.
Input Data
<cause, outcome,
other variables>
Cause
v1,v2
Outcome
v3 v5
w
Domain Knowledge
Causal
effect
DoWhy
Making Assumptions Transparent
Testing those assumptions
Code demo
https://github.com/microsoft/dowhy/blob/master/docs/source/example
_notebooks/dowhy_confounder_example.ipynb
DoWhy encodes the four steps of causal
reasoning
1. Modeling: Create a causal graph to encode assumptions
2. Identification: Formulate what to estimate
3. Estimation: Compute the estimate
4. Refutation: Validate the assumptions
I. Identification: Formulate correct estimand
1. Constructs causal Bayesian network from user-provided
knowledge.
• Check whether the data satisfies the Bayesian network’s
assumptions.
2. Tries out different techniques for identifying a causal effect and
check which ones are feasible.
• Back-door criterion [Pearl 2000]
• Instrumental variable [Wright 1928, Angrist and Pischke 1991]
3. Provides “what to estimate”: a target estimand for causal
effect.
II. Estimation: Compute the causal effect
Uses well-known techniques for causal inference.
Based on the estimand from Formulation step,
implements multiple methods including,
• Stratification
• Propensity score matching,
• Inverse propensity weighting,
• Natural experiments
• Conditional treatment effect estimators from
microsoft/EconML library.
Cause
v1,v2
Outcome
v3
v5
w
I. Formulate estimand
Find variables that “d-separate”
cause and outcome.
II. Estimate causal effect
Estimate as the observed effect
conditioned on the back-door
variables.
Cause
v1,v2
Outcome
v3
v5
w
Cause
v1,v2
Outcome
Input
Causal graph
𝑪𝒂𝒖𝒔𝒆, 𝑶𝒖𝒕, 𝑣1, 𝑣2,
𝑣3, 𝑣4, 𝑣5, 𝑤
𝑶𝒖𝒕 ⫫ 𝑪 𝑣1, 𝑣2] 𝐺¬𝑐→𝑂𝑢𝑡
𝑷(𝑶𝒖𝒕|𝒅𝒐 𝒄 )
=
𝑣 𝑖
𝑃 𝑂𝑢𝑡 𝑐, 𝑣𝑖 𝑃(𝑣𝑖)
COMPUTER SCIENCE
Do-calculus (Pearl 2001)
STATISTICS
Potential Outcomes (Rubin 1984)
𝐸 𝑶𝒖𝒕 𝒄 = 𝟏, 𝑣𝑖
− 𝐸 𝑶𝒖𝒕 𝒄 = 𝟎, 𝑣𝑖
What if the user
forgot to add an important variable to the
graph, or
did not even know about a confounder?
!
III. Refutation/Validation: Test robustness of
obtained estimate
Cause
v1,v2
Outcome
v3
v5
w
Input
Data
Cause
v1,v2, U
Outcome
v3
v5
w
Input
Data
Many “automatic” validation tests: Dummy Outcome test, Placebo test,
Subsample test, Add-unobserved-confounder test,
IIIa. Adding New Confounders
Add a variable 𝑼 that causes both 𝐶𝑎𝑢𝑠𝑒 and
𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
1. 𝑼 is randomly generated.
• Rerun analysis, expect no change in causal effect.
2. 𝑼 is generated to have a correlation 𝜌 with 𝐶𝑎𝑢𝑠𝑒
and 𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
• Assess sensitivity: how fast does the new causal
estimate go to zero?
Cause
v1,v2, U
Outcome
v3
v5
w
𝑿 = 𝑿′
+ 𝑼
𝒀 = 𝒀′ + 𝑼
IIIb. Placebo (“A/A”) test
Simulate a world where 𝐶𝑎𝑢𝑠𝑒 does not affect
𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
Replace 𝐶𝑎𝑢𝑠𝑒 by a randomly generated variable in
the dataset.
• Rerun analysis, expect causal effect to go
to zero.
Cause
v1,v2, U
Outcome
v3
v5
w
IIIc. Subsampling test
Can also test statistical robustness.
E.g., Remove a random subset of the data.
• Rerun analysis, expect no change in the causal effect.
Input Data
Input DataInput Data
Summary: DoWhy, an end-to-end library for
causal inference
Test assumptions as far as possible
• Make assumptions explicit through a Bayesian network.
• Test assumptions from observed data [Sharma 2018, Arxiv].
Assess sensitivity to untested assumptions
• When tests are inconclusive, assess sensitivity of causal estimate to violation
of assumptions [Sharma et al. 2018, Annals of Applied Statistics].
Unify best practices from different scientific fields
• Unify different frameworks from computer science and statistics (“graphs and
potential outcomes”) [Kiciman & Sharma 2018, KDD Tutorial].
Thank you!
Resources
• KDD 2018 tutorial on causal inference
• https://causalinference.gitlab.io/kdd-tutorial/
• Upcoming book on “Causal Reasoning: Fundamentals and ML Applications”
• https://causalinference.gitlab.io/
• DoWhy
• Code: https://github.com/microsoft/dowhy
• Docs: https://microsoft.github.io/dowhy/
Amit Sharma
Microsoft Research India
@amt_shrma

Contenu connexe

Tendances

Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsXavier Amatriain
 
Feature Engineering for ML - Dmitry Larko, H2O.ai
Feature Engineering for ML - Dmitry Larko, H2O.aiFeature Engineering for ML - Dmitry Larko, H2O.ai
Feature Engineering for ML - Dmitry Larko, H2O.aiSri Ambati
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisitedXavier Amatriain
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
Towards Human-Centered Machine Learning
Towards Human-Centered Machine LearningTowards Human-Centered Machine Learning
Towards Human-Centered Machine LearningSri Ambati
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Krishnaram Kenthapadi
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AIBill Liu
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEDatabricks
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineMichael Gerke
 
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Praxitelis Nikolaos Kouroupetroglou
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning BasicsSuresh Arora
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 

Tendances (20)

Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
Feature Engineering for ML - Dmitry Larko, H2O.ai
Feature Engineering for ML - Dmitry Larko, H2O.aiFeature Engineering for ML - Dmitry Larko, H2O.ai
Feature Engineering for ML - Dmitry Larko, H2O.ai
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Towards Human-Centered Machine Learning
Towards Human-Centered Machine LearningTowards Human-Centered Machine Learning
Towards Human-Centered Machine Learning
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AI
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
Incrementality
IncrementalityIncrementality
Incrementality
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 

Similaire à DoWhy Python library for causal inference: An End-to-End tool

UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsData Science Milan
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusoneDotNetCampus
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus
 
capture-recapture Single Defect
capture-recapture Single Defectcapture-recapture Single Defect
capture-recapture Single DefectJames Orr
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
The Art Of Debugging
The Art Of DebuggingThe Art Of Debugging
The Art Of Debuggingsvilen.ivanov
 
How to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated TestingHow to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated TestingTechWell
 
Experimental design version 4.3
Experimental design version 4.3Experimental design version 4.3
Experimental design version 4.3jschmied
 
Risk Management in Data Analysis
Risk Management in Data AnalysisRisk Management in Data Analysis
Risk Management in Data AnalysisDavid Lee
 
Exploratory Testing in an Agile Context
Exploratory Testing in an Agile ContextExploratory Testing in an Agile Context
Exploratory Testing in an Agile ContextElisabeth Hendrickson
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
 
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...Jörg Bächtiger
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Matt Hansen
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Matt Hansen
 
Exploratory Testing Explained
Exploratory Testing ExplainedExploratory Testing Explained
Exploratory Testing ExplainedTechWell
 
A beginners guide to testing
A beginners guide to testingA beginners guide to testing
A beginners guide to testingPhilip Johnson
 

Similaire à DoWhy Python library for causal inference: An End-to-End tool (20)

UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning Methods
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
capture-recapture Single Defect
capture-recapture Single Defectcapture-recapture Single Defect
capture-recapture Single Defect
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
The Art Of Debugging
The Art Of DebuggingThe Art Of Debugging
The Art Of Debugging
 
How to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated TestingHow to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated Testing
 
Experimental design version 4.3
Experimental design version 4.3Experimental design version 4.3
Experimental design version 4.3
 
Risk Management in Data Analysis
Risk Management in Data AnalysisRisk Management in Data Analysis
Risk Management in Data Analysis
 
Exploratory Testing in an Agile Context
Exploratory Testing in an Agile ContextExploratory Testing in an Agile Context
Exploratory Testing in an Agile Context
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
Software testing
Software testingSoftware testing
Software testing
 
Exploratory Testing in Practice
Exploratory Testing in PracticeExploratory Testing in Practice
Exploratory Testing in Practice
 
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
 
2014 toronto-torbug
2014 toronto-torbug2014 toronto-torbug
2014 toronto-torbug
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
 
Exploratory Testing Explained
Exploratory Testing ExplainedExploratory Testing Explained
Exploratory Testing Explained
 
A beginners guide to testing
A beginners guide to testingA beginners guide to testing
A beginners guide to testing
 

Plus de Amit Sharma

Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAmit Sharma
 
The Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceThe Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceAmit Sharma
 
Artificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactArtificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactAmit Sharma
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsAmit Sharma
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleAmit Sharma
 
Auditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAuditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAmit Sharma
 
Causal inference in data science
Causal inference in data scienceCausal inference in data science
Causal inference in data scienceAmit Sharma
 
Causal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesCausal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesAmit Sharma
 
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesEquivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesAmit Sharma
 
Estimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsEstimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsAmit Sharma
 
Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Amit Sharma
 
Data mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comData mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comAmit Sharma
 
Estimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsEstimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsAmit Sharma
 
From prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsFrom prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsAmit Sharma
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practiceAmit Sharma
 
Causal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereCausal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereAmit Sharma
 
The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...Amit Sharma
 
The role of social connections in shaping our preferences
The role of social connections in shaping our preferencesThe role of social connections in shaping our preferences
The role of social connections in shaping our preferencesAmit Sharma
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...Amit Sharma
 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationAmit Sharma
 

Plus de Amit Sharma (20)

Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal Models
 
The Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceThe Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practice
 
Artificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactArtificial Intelligence for Societal Impact
Artificial Intelligence for Societal Impact
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systems
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scale
 
Auditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAuditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographics
 
Causal inference in data science
Causal inference in data scienceCausal inference in data science
Causal inference in data science
 
Causal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesCausal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practices
 
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesEquivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
 
Estimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsEstimating the causal impact of recommender systems
Estimating the causal impact of recommender systems
 
Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...
 
Data mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comData mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.com
 
Estimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsEstimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actions
 
From prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsFrom prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systems
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
 
Causal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereCausal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhere
 
The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...
 
The role of social connections in shaping our preferences
The role of social connections in shaping our preferencesThe role of social connections in shaping our preferences
The role of social connections in shaping our preferences
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendation
 

Dernier

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Dernier (20)

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

DoWhy Python library for causal inference: An End-to-End tool

  • 1. DoWhy: An end-to-end library for causal inference Amit Sharma (@amt_shrma), Emre Kiciman (@emrek) Microsoft Research A big thanks to Adam Kelleher, Tanmay Kulkarni and many other open- source contributors! https://github.com/microsoft/dowhy
  • 2. Prediction Causation Assume: 𝑃𝑡𝑟𝑎𝑖𝑛 𝑊, 𝑋, 𝑌 = 𝑃𝑡𝑒𝑠𝑡(𝑊, 𝑋, 𝑌) Estimate: min 𝐿( 𝑦, 𝑦) Evaluate: Cross-validation
  • 3. Fundamental problem with causal inference • Causal inference concerns estimation about different data distributions than the training distribution • What if 𝑥 is changed to different value? • How do the results change for a different sample of people? • What if a particular algorithm is changed in a system? • Often, no data is available for that distribution • Cross-validation is not possible • 𝛽is not observed, unlike 𝑦. • 𝑦 is observed for training domain, but not for a new domain.
  • 4. Estimation about different data distributions than the training distribution. Often, no data is available for that distribution. 1. Assumptions 2. Evaluation
  • 5. 1. Assumptions drive causal inference • Causal inference methods depend on untestable assumptions. • Even with large-scale data, the final estimate can be heavily sensitive to those assumptions. • Important to transparently communicate those assumptions. Item 1 Demand for items Item 2 Recommendation System
  • 6. 2. Causal estimates are hard to validate • Cannot compare two causal estimates on the same dataset • Q: Is algorithm B better than the production algorithm A? • Cannot tell without doing an A/B test. • Let alone compare two estimates from two different datasets • Everyone prefers their own favorite methods • Need objective metrics to validate causal estimates
  • 7. The effect of online advertising on sales is 20% (std error=5) What assumptions went in the analysis? How would it change if one of the assumptions was incorrect? Is it robust to seasonal shifts in behavior? What is the expected error in this estimate?
  • 8. We built DoWhy to make assumptions front-and- center of any causal analysis. - Transparent declaration of assumptions - Evaluation of those assumptions, to the extent possible An end-to-end platform for doing causal inference
  • 9. Formulate correct estimand Estimate causal effect Check robustness Input Data <cause, outcome, other variables> Domain Knowledge Causal effect CausalImpact, tmle, causaleffect,…
  • 10. Formulate correct estimand • Check data with properties implied by the model Estimate causal effect • Use a suitable method to estimate effect. Check robustness • Refute obtained estimate through multiple tests. Input Data <cause, outcome, other variables> Cause v1,v2 Outcome v3 v5 w Domain Knowledge Causal effect DoWhy
  • 13. DoWhy encodes the four steps of causal reasoning 1. Modeling: Create a causal graph to encode assumptions 2. Identification: Formulate what to estimate 3. Estimation: Compute the estimate 4. Refutation: Validate the assumptions
  • 14. I. Identification: Formulate correct estimand 1. Constructs causal Bayesian network from user-provided knowledge. • Check whether the data satisfies the Bayesian network’s assumptions. 2. Tries out different techniques for identifying a causal effect and check which ones are feasible. • Back-door criterion [Pearl 2000] • Instrumental variable [Wright 1928, Angrist and Pischke 1991] 3. Provides “what to estimate”: a target estimand for causal effect.
  • 15. II. Estimation: Compute the causal effect Uses well-known techniques for causal inference. Based on the estimand from Formulation step, implements multiple methods including, • Stratification • Propensity score matching, • Inverse propensity weighting, • Natural experiments • Conditional treatment effect estimators from microsoft/EconML library.
  • 16. Cause v1,v2 Outcome v3 v5 w I. Formulate estimand Find variables that “d-separate” cause and outcome. II. Estimate causal effect Estimate as the observed effect conditioned on the back-door variables. Cause v1,v2 Outcome v3 v5 w Cause v1,v2 Outcome Input Causal graph 𝑪𝒂𝒖𝒔𝒆, 𝑶𝒖𝒕, 𝑣1, 𝑣2, 𝑣3, 𝑣4, 𝑣5, 𝑤 𝑶𝒖𝒕 ⫫ 𝑪 𝑣1, 𝑣2] 𝐺¬𝑐→𝑂𝑢𝑡 𝑷(𝑶𝒖𝒕|𝒅𝒐 𝒄 ) = 𝑣 𝑖 𝑃 𝑂𝑢𝑡 𝑐, 𝑣𝑖 𝑃(𝑣𝑖) COMPUTER SCIENCE Do-calculus (Pearl 2001) STATISTICS Potential Outcomes (Rubin 1984) 𝐸 𝑶𝒖𝒕 𝒄 = 𝟏, 𝑣𝑖 − 𝐸 𝑶𝒖𝒕 𝒄 = 𝟎, 𝑣𝑖
  • 17. What if the user forgot to add an important variable to the graph, or did not even know about a confounder? !
  • 18. III. Refutation/Validation: Test robustness of obtained estimate Cause v1,v2 Outcome v3 v5 w Input Data Cause v1,v2, U Outcome v3 v5 w Input Data Many “automatic” validation tests: Dummy Outcome test, Placebo test, Subsample test, Add-unobserved-confounder test,
  • 19. IIIa. Adding New Confounders Add a variable 𝑼 that causes both 𝐶𝑎𝑢𝑠𝑒 and 𝑂𝑢𝑡𝑐𝑜𝑚𝑒. 1. 𝑼 is randomly generated. • Rerun analysis, expect no change in causal effect. 2. 𝑼 is generated to have a correlation 𝜌 with 𝐶𝑎𝑢𝑠𝑒 and 𝑂𝑢𝑡𝑐𝑜𝑚𝑒. • Assess sensitivity: how fast does the new causal estimate go to zero? Cause v1,v2, U Outcome v3 v5 w 𝑿 = 𝑿′ + 𝑼 𝒀 = 𝒀′ + 𝑼
  • 20. IIIb. Placebo (“A/A”) test Simulate a world where 𝐶𝑎𝑢𝑠𝑒 does not affect 𝑂𝑢𝑡𝑐𝑜𝑚𝑒. Replace 𝐶𝑎𝑢𝑠𝑒 by a randomly generated variable in the dataset. • Rerun analysis, expect causal effect to go to zero. Cause v1,v2, U Outcome v3 v5 w
  • 21. IIIc. Subsampling test Can also test statistical robustness. E.g., Remove a random subset of the data. • Rerun analysis, expect no change in the causal effect. Input Data Input DataInput Data
  • 22. Summary: DoWhy, an end-to-end library for causal inference Test assumptions as far as possible • Make assumptions explicit through a Bayesian network. • Test assumptions from observed data [Sharma 2018, Arxiv]. Assess sensitivity to untested assumptions • When tests are inconclusive, assess sensitivity of causal estimate to violation of assumptions [Sharma et al. 2018, Annals of Applied Statistics]. Unify best practices from different scientific fields • Unify different frameworks from computer science and statistics (“graphs and potential outcomes”) [Kiciman & Sharma 2018, KDD Tutorial].
  • 23. Thank you! Resources • KDD 2018 tutorial on causal inference • https://causalinference.gitlab.io/kdd-tutorial/ • Upcoming book on “Causal Reasoning: Fundamentals and ML Applications” • https://causalinference.gitlab.io/ • DoWhy • Code: https://github.com/microsoft/dowhy • Docs: https://microsoft.github.io/dowhy/ Amit Sharma Microsoft Research India @amt_shrma

Notes de l'éditeur

  1. So this tutorial is going to be about how to get better at it. Suppose a simple world. If you believe everything relevant is captured, go for prediction and you should be fine. But if not, need to understand causal factors. And btw, we will also learn that regression is one of the worst methods because the world is almost never linear. One of the reasons is that while we have got pretty good at processing terabytes of data, causal inference methods haven’t caught up. The example I like to think of is fundamentally the difference between prediction and causation, as I describe in a recent paper in Science. Suppose there are two variables and for simplicity, you believe that the true model of the world is given by beta x. But we know nothing about the error eta, it may not even be independent. Prediction is what big data is often used for…so if we have a big dataset with two variables X and Y, , we can simply feed it to a machine learning algorithm to get reasonable accurate predictions for y. However, often the most interesting questions are of a causal nature---does X cause Y---and here it is not entirely clear how to use big data:
  2. So let’s first look at how someone at Office or in biomedical scientist would have run a causal analysis, before DoWhy. Finally *think of* how to check robustness of estimate.
  3. What DoWhy does is that it implements the hard parts of all three steps, leading to an easy interface. The user still has to provide domain knowledge, that’s a harder problem we don’t solve yet..but we want to..
  4. Once it has done that, the second step is straightforward.
  5. As an example of the first two steps While both of these are prior work, they are actually parts of two different frameworks that are not used together. DoWhy combines them and
  6. All this is good, but..
  7. Test the sensitivity of the estimate as causal assumptions are violated. Test like a scientific theory
  8. And how is dowhy able to implement the full workflow of causal inference ?Dowhy consider assumptions its first class citizens. A user is encouraged to think more about their domain assumptions, than the methods. And at the backend, DoWhy uses our recent research to test those assumptions as far as possible. -- Causal inference methods depend critically on assumptions Vast, contradictory literature Assumptions are often not explicit, masked in “statistical assumptions” What happens when the method’s assumptions fail? Causal analysis restricted to experts in causal inference or statistics