SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
Causal challenges for AI
David Lopez-Paz
Facebook AI Research
Clever Hans (1907)
(Sturm, 2014)
Outline
What’s wrong with machine learning?
A causal proposal
Searching for causality I: observational data
Searching for causality II: multiple environments
Conclusion
What succeeds in machine learning?
The recent winner (Hu et al., 2017) achieves a super-human performance of 2.2%.
What succeeds in machine learning?
(From Kartik Audhkhasi)
What succeeds in machine learning?
(Wikipedia, 2018)
What succeeds in machine learning?
(Silver et al., 2016)
What are the reasons for these successes?
Machines pull impressive performances at
− recognizing objects after training on more images than a human can see,
− translating natural languages after training on more bilingual text than a human can read,
− beating humans at Atari after playing more games than any teenager can endure,
− reigning Go after playing more grandmaster level games than mankind
Models consume too much data to solve a single task!
(From L´eon Bottou)
What fails in machine learning?
(From Pietro Perona)
What fails in machine learning?
(From Pietro Perona)
What fails in machine learning?
(Rosenfeld et al., 2018)
What fails in machine learning?
(Stock and Cisse, 2017)
What fails in machine learning?
(From Jamie Kiros)
What fails in machine learning?
(Jabri et al., 2016)
What fails in machine learning?
(Szegedy et al., 2013)
What fails in machine learning?
(IBM system at ICLR 2017)
What are the reasons for these failures?
The big liea
in machine learning:
Ptrain(X, Y ) = Ptest(X, Y )
aAs called by Zoubin Ghahramani.
− focus on interpolation
− out-of-distribution catastrophes
− over-justification of “minimizing the average error”
− emphasize the common, forget the rare
− reckless learning
Horses cheat our statistical estimation problems by using unexpected features
Outline
What’s wrong with machine learning?
A causal proposal
Searching for causality I: observational data
Searching for causality II: multiple environments
Conclusion
This talk in one slide
Predict Y from (X, Z). Process generating labeled training data:
X ← N(0, 1),
Y ← X + N(0, 1)
Z ← Y + N(0, 1).
Least-squares solution: YLS = X
2 + Z
2
Causal solution: YCau = X
Predict Y from (X, Z). Process generating unlabeled testing data:
X ← N(0, 1),
Y ← X + N(0, 1)
Z ← Y + N(0, 10).
Least-squares solution breaks at testing time!
Getting around the big lie machine learning
Horses absorb all training correlations recklessly, incl. confounders and spurious patterns
∼
If Ptrain ̸= Ptest, what correlations should we learn and what correlations should we ignore?
Reichenbach’s Principle of Common Cause
Correlations between X and Y arise due to one of the three causal structures
X Y X Y X Y
Z
What happens to Y when someone manipulates X? Why is Y = 2?
(Reichenbach, 1956) formalizes the claim “dependence does not imply causation”
∼
We are interested in causal correlations (from features to target)
Predicting open umbrellas from rain is more stable than predicting rain from open umbrellas
Focus on causal correlations for invariance?
(Woodward, 2005)
Focus on causal correlations for truth?
(Pearl, 2018)
The causal explanation predicts the outcome of real experiments in the world
∼
We will now explore two ways to discover causality in data using data alone
Outline
What’s wrong with machine learning?
A causal proposal
Searching for causality I: observational data
Searching for causality II: multiple environments
Conclusion
How does causation look like?
(Hertzsprung–Russell diagrams, 1911)
How does causation look like?
(Messerli, 2012)
How does causation look like?
−1 0 1
U
−1
0
1
V
−1 0 1
V
−1
0
1
U
Effect = f(Cause) + Noise
Cause independent from Noise
(Peters et al., 2014)
How does causation look like?
0.0 0.5 1.0
X
−3
−2
−1
0
1
2
3
Y
P(Y )
P(X)
Effect = f(Cause)
p(Cause) independent from f′
(Daniusis et al., 2010)
How does causation look like?
x → y x → y x → y x → y x → y x → y x → y x → y
x → y x → y x → y x → y x → y x → y x → y x → y
x → y x → y x → y x → y x → y x → y x → y x → y
x → y x → y x → y x → y x → y x → y x → y x → y
x → y x → y x → y x → y x → y x → y x → y x → y
x → y x → y x → y x → y x → y x → y x ← y x ← y
x ← y x ← y x ← y x ← y x ← y x ← y x ← y x ← y
x ← y x ← y x ← y x → y x → y x → y x → y x ← y
x ← y x → y x → y x ← y x → y x ← y x → y x ← y
x → y x ← y x ← y x → y x → y x → y x ← y x → y
(Mooij et al., 2014)
NCC: learning causation footprints
{(xij, yij)}mi
j=1 (xi1, yi1)
(ximi , yimi )
1
mi
∑mi
j=1(·) ˆP(Xi → Yi)
average
classifier layers
embedding layers
each point featurized separately
(Lopez-Paz et al., 2017)
Trained using synthetic data!
NCC is the state-of-the-art
0 20 40 60 80 100
020406080100
decission rate
classificationaccuracy
RCC
ANM
IGCI
NCC is the state-of-the-art
NCC discovers causation in images
Features inside bounding boxes are caused by the presence of objects (wheel)
Features outside bounding boxes cause the presence of objects (road)object-featureratio
(Lopez-Paz et al., 2017)
NCC discovers causation in language
Between word2vec vectors relation concepts such as “smoking → cancer”
counts(WS)
prec-counts(WS)
prec-counts(entropy)
PMI(WS)
prec-PMI(WS)
counts(entropy)
PMI(entropy)
prec-PMI(entropy)
frequency
precedence
distr.prec-PMI
distr.w2vio
distr.PMI
distr.counts
distr.prec-counts
distr.w2vii
distr.w2voi
feat.counts
feat.prec-counts
feat.PPMI
feat.prec-PPMI
feat.w2vio
feat.w2voi
feat.w2vii
feat.w2voutput
feat.w2vinput
feat.w2vall
0.4
0.5
0.6
0.7
0.8
0.9
testaccuracy
baselines
distribution-based
feature-based
(Rojas-Carulla et al., 2017)
New hopes for unsupervised learning?
There are unexpected causal signals in unsupervised data!
These allow to gain causal intuitions from data, reducing the need for experimentation
What metrics/divergences best extract these causal signals, while discarding the rest?
We want simple models for a complex world (IKEA instructions)
− Against the usual hope of consistency (P = Q as n → ∞)
First results
Cause-effect discovery ≈ choosing the simplest model (Stegle et al., 2010) using a divergence
− GANs divergences distinguish between cause and effect (Lopez-Paz and Oquab, 2016)
− Discriminator((Cause, Generator(Cause, Noise)), (Cause, Effect))
is harder than
Discriminator((Generator(Effect, Noise), Effect), (Cause, Effect))
− These ideas extend to multiple variables (Goudet et al., 2017; Kalainathan et al., 2018)
− Each divergence has important geometry implications (Bottou et al., 2018)
− Hyperbolic divergences recover complex causal hierarchies (Klimovskaia et al., 2018)
p1
p2
p3
p4
p5
a b
...
Euclidean space Poincaré Ball
Preserve pairwise
distances
c
First conclusion
There are causal signals in unsupervised data ready to be leveraged in novel ways
Outline
What’s wrong with machine learning?
A causal proposal
Searching for causality I: observational data
Searching for causality II: multiple environments
Conclusion
Moving beyond the big lie
Ptrain(X, Y ) ̸= Ptest(X, Y )
Then, what remains invariant between train and test data?
∼
We assume that Ptrain and Ptest produce data about the same phenomena under different
experimental conditions, circumstances, or environments
∼
To succeed at the test environment, we observe multiple training environments and
− learn what is invariant across environments
− discard what is specific to each environment
∼
There is a causal justification for proceeding this way!
Functional causal models
A common tool to describe causal structures is the one of Functional Causal Model (FCM)
X1 X2
X3X4
Y
X1 ← f1(N1)
X2 ← f2(X1, X3, N2)
X3 ← f3(X1, N3) // X1 causes X3
X4 ← f4(X1, N4)
Y ← fy(X2, X3, Ny)
Ni ∼ P(N)
FCMs are compositional and allow counterfactual reasoning
FCMs are generative: observing their eqs produces the observational distribution P(X, Y )
We can also intervene the FCM eqs to produce interventional distributions ˜P(X, Y )!
∼
Each intervention produces one environment (distribution) of the phenomena (FCM) of
interest!
Functional causal models
One FCM = multiple interventions/distributions/environments
P1
train(X, Y ) ∼
X1 X2
X3X4
Y
X1 = f1(N1)
X2 = f2(X1, X3, N2)
X3= 1.5
X4 = f4(X1, N4)
Y = fy(X2, X3, Ny)
Ni ∼ P(N)
Functional causal models
One FCM = multiple interventions/distributions/environments
P2
train(X, Y ) ∼
X1 X2
X3X4
Y
X1∼ N(0, 1)
X2 = f2(X1, X3, N2)
X3 = f3(X1, N3)
X4 = f4(X1, N4)
Y = fy(X2, X3, Ny)
Ni ∼ P(N)
Functional causal models
One FCM = multiple interventions/distributions/environments
P3
train(X, Y ) ∼
X1 X2
X3X4
Y
X1 = f1(N1)
X2= f2(X1, X3, N2) + U(−10, 10)
X3 = f3(X1, N3)
X4 = f4(X1, N4)
Y = fy(X2, X3, Ny)
Ni ∼ P(N)
Functional causal models
X1 X2
X3X4
Y
X1 = f1(N1)
X2 = f2(X1, X3, N2)
X3 = f3(X1, N3)
X4 = f4(X1, N4)
Y= fy(X2, X3, Ny)
Ni ∼ P(N)
If mechanisms are autonomous, and
no intervention disturbs the conditional expectation of the target causal equation:
− the causal conditional distribution E(Y |X2, X3) remains invariant
− the non-causal conditional distribution E(Y |X) may vary wildly!
This reveals the link between invariances across environments and causal structures
∼
How can we find invariant causal predictors?
A simple example: X → Y → Z
For all environments e ∈ R:
Xe
← N(0, e),
Y e
← Xe
+ N(0, e)
Ze
← Y e
+ N(0, 1).
The task is to predict Y e
given (Xe
, Ze
) for unknown test e. We have three options:
E[Y e
|Xe
= x] = x,
E[Y e
|Ze
= z] =
2e
2e + 1
z,
E[Y e
|Xe
= x, Ze
= z] =
1
e + 1
x +
e
e + 1
z
The causal predictor based on x is invariant!
The state-of-the-art (Ganin et al., 2016; Peters et al., 2016) fails at this simple example
Our proposal
Find a feature representation that leads to the same optimal classifier across environments.
∼
Let we
ϕ be the optimal classifier for environment e, when using the featurizer ϕ:
we
ϕ = arg min
w
RP e (w ◦ ϕ),
where RP e (f) = E(x,y)∼P e
[
Error(f(x), y)
]
. Measure classifier discrepancy:
∥we
ϕ − we′
ϕ ∥P =
∫
(we
ϕ(ϕ(x)) − we′
ϕ (ϕ(x)))2
dP(X)
Let ¯w = 1
e
∑
e we
ϕ. Then, our new learning objective is:
arg min
ϕ
∑
e
RP e ( ¯w ◦ ϕ) + λ
∑
e,e′̸=e
∥we
ϕ − we′
ϕ ∥P e
(Arjovsky et al., 2018)
An approximation to our proposal
C(ϕ) =
∑
e
RP e ( ¯w ◦ ϕ) + λ
∑
e,e′̸=e
∥we
ϕ − we′
ϕ ∥P e
is an intractable bi-level optimization problem, since we
ϕ is an optimization problem itself
We approximate the interactions between the optimization problems using unrolled gradients
∼
1. Initialize at random ϕ and we
ϕ, for all e
1.1 Update we
ϕ ← Gradient(RP e , we
ϕ) using one step and fixed ϕ, for all e
1.2 Update me
ϕ ← Gradient(RP e , we
ϕ) using k steps and fixed ϕ, for all e
1.3 Update ϕ ← Gradient(C, me
ϕ) using one step and fixed me
ϕ
2. Return
(
1
e
∑
e we
ϕ
)
◦ ϕ
(Arjovsky et al., 2018)
First results
Empirical risk minimization:
Causal risk minimization:
∼
Implications to fairness? Partitions of one dataset? Theory?
Multiple environments in the big picture
setup training test
generative learning U1
1 ∅
unsupervised learning U1
1 U1
2
supervised learning L1
1 U1
1
semi-supervised learning L1
1U1
1 U1
2
transductive learning L1
1U1
1 U1
1
multitask learning L1
1L2
1 U1
2 U2
2
domain adaptation L1
1U2
1 U2
2
transfer learning U1
1 L2
1 U2
1
continual learning L1
1, . . . , L∞
1 U1
1 , . . . , U∞
1
multi-environment learning L1
1L2
1 U3
1 U4
1
− Li
j: labeled dataset number j drawn from distribution i
− Ui
j : unlabeled dataset number j drawn from distribution i
Second conclusion
Prediction rules based on stable correlations across environments are likely to be causal 1
1I call this the principle of causal concentration.
Outline
What’s wrong with machine learning?
A causal proposal
Searching for causality I: observational data
Searching for causality II: multiple environments
Conclusion
Finally: from machine learning to artificial intelligence
AIs will be world simulators that will
− align with the causal outcomes in the world,
− perform robustly across diverse environments,
− interrogate composable autonomous mechanisms to extrapolate,
− allow to imagine multiple futures given uncertainty about a situation,
− enable counterfactual reasoning for extreme generalization
These causal desiderata are out of reach for current machine learning systems. Let’s get to it!
∼
Thanks!
References I
Martin Arjovsky, Leon Bottou, and David Lopez-Paz. Learning invariant causal rules across environments. In preparation, 2018.
Leon Bottou, Martin Arjovsky, David Lopez-Paz, and Maxime Oquab. Geometrical insights for implicit generative modeling. In Braverman
Readings in Machine Learning. Key Ideas from Inception to Current State. Springer, 2018.
Povilas Daniusis, Dominik Janzing, Joris Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Sch¨olkopf. Inferring
deterministic causal relations. In UAI, 2010.
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Fran¸cois Laviolette, Mario Marchand, and Victor
Lempitsky. Domain-adversarial training of neural networks. JMLR, 2016.
O. Goudet, D. Kalainathan, P. Caillou, I. Guyon, D. Lopez-Paz, and M. Sebag. Causal Generative Neural Networks. arXiv, 2017.
Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. arXiv, 2017.
Allan Jabri, Armand Joulin, and Laurens van der Maaten. Revisiting visual question answering baselines. In ECCV, 2016.
D. Kalainathan, O. Goudet, I. Guyon, D. Lopez-Paz, and M. Sebag. SAM: Structural Agnostic Model, Causal Discovery and Penalized
Adversarial Learning. arXiv, 2018.
Anna Klimovskaia, Leon Bottou, David Lopez-Paz, and Maximilian Nickel. Poincar maps recover continuous hierarchies in single-celldata.
In preparation, 2018.
David Lopez-Paz and Maxime Oquab. Revisiting classifier two-sample tests. ICLR, 2016.
David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Sch¨olkopf, and L´eon Bottou. Discovering causal signals in images.
CVPR, 2017.
Franz H. Messerli. Chocolate consumption, cognitive function, and nobel laureates. New England Journal of Medicine, 2012.
Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Sch¨olkopf. Distinguishing cause from effect using
observational data: methods and benchmarks. JMLR, 2014.
Judea Pearl. Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv, 2018.
Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Sch¨olkopf. Causal discovery with continuous additive noise models. JMLR,
2014.
Jonas Peters, Peter B¨uhlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence
intervals. Journal of the Royal Statistical Society, 2016.
Hans Reichenbach. The direction of time. Dover, 1956.
Mateo Rojas-Carulla, Marco Baroni, and David Lopez-Paz. Causal discovery using proxy variables. In preparation, 2017.
A. Rosenfeld, R. Zemel, and J. K. Tsotsos. The Elephant in the Room. arXiv, 2018.
References II
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis
Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature,
2016.
Oliver Stegle, Dominik Janzing, Kun Zhang, Joris M Mooij, and Bernhard Sch¨olkopf. Probabilistic latent variable models for
distinguishing between cause and effect. In NIPS. 2010.
Pierre Stock and Moustapha Cisse. Convnets and imagenet beyond accuracy: Explanations, bias detection, adversarial examples and
model criticism. arXiv, 2017.
B. L. Sturm. A simple method to determine if a music information retrieval system is a “horse”. IEEE Transactions on Multimedia, 2014.
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing
properties of neural networks. ICLR, 2013.
James Woodward. Making things happen: A theory of causal explanation. Oxford university press, 2005.

Contenu connexe

Dernier

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

En vedette

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming LanguageSimplilearn
 

En vedette (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
 

Causal challenges in Artificial Intelligence

  • 1. Causal challenges for AI David Lopez-Paz Facebook AI Research
  • 3. Outline What’s wrong with machine learning? A causal proposal Searching for causality I: observational data Searching for causality II: multiple environments Conclusion
  • 4. What succeeds in machine learning? The recent winner (Hu et al., 2017) achieves a super-human performance of 2.2%.
  • 5. What succeeds in machine learning? (From Kartik Audhkhasi)
  • 6. What succeeds in machine learning? (Wikipedia, 2018)
  • 7. What succeeds in machine learning? (Silver et al., 2016)
  • 8. What are the reasons for these successes? Machines pull impressive performances at − recognizing objects after training on more images than a human can see, − translating natural languages after training on more bilingual text than a human can read, − beating humans at Atari after playing more games than any teenager can endure, − reigning Go after playing more grandmaster level games than mankind Models consume too much data to solve a single task! (From L´eon Bottou)
  • 9. What fails in machine learning? (From Pietro Perona)
  • 10. What fails in machine learning? (From Pietro Perona)
  • 11. What fails in machine learning? (Rosenfeld et al., 2018)
  • 12. What fails in machine learning? (Stock and Cisse, 2017)
  • 13. What fails in machine learning? (From Jamie Kiros)
  • 14. What fails in machine learning? (Jabri et al., 2016)
  • 15. What fails in machine learning? (Szegedy et al., 2013)
  • 16. What fails in machine learning? (IBM system at ICLR 2017)
  • 17. What are the reasons for these failures? The big liea in machine learning: Ptrain(X, Y ) = Ptest(X, Y ) aAs called by Zoubin Ghahramani. − focus on interpolation − out-of-distribution catastrophes − over-justification of “minimizing the average error” − emphasize the common, forget the rare − reckless learning Horses cheat our statistical estimation problems by using unexpected features
  • 18. Outline What’s wrong with machine learning? A causal proposal Searching for causality I: observational data Searching for causality II: multiple environments Conclusion
  • 19. This talk in one slide Predict Y from (X, Z). Process generating labeled training data: X ← N(0, 1), Y ← X + N(0, 1) Z ← Y + N(0, 1). Least-squares solution: YLS = X 2 + Z 2 Causal solution: YCau = X Predict Y from (X, Z). Process generating unlabeled testing data: X ← N(0, 1), Y ← X + N(0, 1) Z ← Y + N(0, 10). Least-squares solution breaks at testing time!
  • 20. Getting around the big lie machine learning Horses absorb all training correlations recklessly, incl. confounders and spurious patterns ∼ If Ptrain ̸= Ptest, what correlations should we learn and what correlations should we ignore?
  • 21. Reichenbach’s Principle of Common Cause Correlations between X and Y arise due to one of the three causal structures X Y X Y X Y Z What happens to Y when someone manipulates X? Why is Y = 2? (Reichenbach, 1956) formalizes the claim “dependence does not imply causation” ∼ We are interested in causal correlations (from features to target) Predicting open umbrellas from rain is more stable than predicting rain from open umbrellas
  • 22. Focus on causal correlations for invariance? (Woodward, 2005)
  • 23. Focus on causal correlations for truth? (Pearl, 2018) The causal explanation predicts the outcome of real experiments in the world ∼ We will now explore two ways to discover causality in data using data alone
  • 24. Outline What’s wrong with machine learning? A causal proposal Searching for causality I: observational data Searching for causality II: multiple environments Conclusion
  • 25. How does causation look like? (Hertzsprung–Russell diagrams, 1911)
  • 26. How does causation look like? (Messerli, 2012)
  • 27. How does causation look like? −1 0 1 U −1 0 1 V −1 0 1 V −1 0 1 U Effect = f(Cause) + Noise Cause independent from Noise (Peters et al., 2014)
  • 28. How does causation look like? 0.0 0.5 1.0 X −3 −2 −1 0 1 2 3 Y P(Y ) P(X) Effect = f(Cause) p(Cause) independent from f′ (Daniusis et al., 2010)
  • 29. How does causation look like? x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x → y x ← y x ← y x ← y x ← y x ← y x ← y x ← y x ← y x ← y x ← y x ← y x ← y x ← y x → y x → y x → y x → y x ← y x ← y x → y x → y x ← y x → y x ← y x → y x ← y x → y x ← y x ← y x → y x → y x → y x ← y x → y (Mooij et al., 2014)
  • 30. NCC: learning causation footprints {(xij, yij)}mi j=1 (xi1, yi1) (ximi , yimi ) 1 mi ∑mi j=1(·) ˆP(Xi → Yi) average classifier layers embedding layers each point featurized separately (Lopez-Paz et al., 2017) Trained using synthetic data!
  • 31. NCC is the state-of-the-art 0 20 40 60 80 100 020406080100 decission rate classificationaccuracy RCC ANM IGCI
  • 32. NCC is the state-of-the-art
  • 33. NCC discovers causation in images Features inside bounding boxes are caused by the presence of objects (wheel) Features outside bounding boxes cause the presence of objects (road)object-featureratio (Lopez-Paz et al., 2017)
  • 34. NCC discovers causation in language Between word2vec vectors relation concepts such as “smoking → cancer” counts(WS) prec-counts(WS) prec-counts(entropy) PMI(WS) prec-PMI(WS) counts(entropy) PMI(entropy) prec-PMI(entropy) frequency precedence distr.prec-PMI distr.w2vio distr.PMI distr.counts distr.prec-counts distr.w2vii distr.w2voi feat.counts feat.prec-counts feat.PPMI feat.prec-PPMI feat.w2vio feat.w2voi feat.w2vii feat.w2voutput feat.w2vinput feat.w2vall 0.4 0.5 0.6 0.7 0.8 0.9 testaccuracy baselines distribution-based feature-based (Rojas-Carulla et al., 2017)
  • 35. New hopes for unsupervised learning? There are unexpected causal signals in unsupervised data! These allow to gain causal intuitions from data, reducing the need for experimentation What metrics/divergences best extract these causal signals, while discarding the rest? We want simple models for a complex world (IKEA instructions) − Against the usual hope of consistency (P = Q as n → ∞)
  • 36. First results Cause-effect discovery ≈ choosing the simplest model (Stegle et al., 2010) using a divergence − GANs divergences distinguish between cause and effect (Lopez-Paz and Oquab, 2016) − Discriminator((Cause, Generator(Cause, Noise)), (Cause, Effect)) is harder than Discriminator((Generator(Effect, Noise), Effect), (Cause, Effect)) − These ideas extend to multiple variables (Goudet et al., 2017; Kalainathan et al., 2018) − Each divergence has important geometry implications (Bottou et al., 2018) − Hyperbolic divergences recover complex causal hierarchies (Klimovskaia et al., 2018) p1 p2 p3 p4 p5 a b ... Euclidean space Poincaré Ball Preserve pairwise distances c
  • 37. First conclusion There are causal signals in unsupervised data ready to be leveraged in novel ways
  • 38. Outline What’s wrong with machine learning? A causal proposal Searching for causality I: observational data Searching for causality II: multiple environments Conclusion
  • 39. Moving beyond the big lie Ptrain(X, Y ) ̸= Ptest(X, Y ) Then, what remains invariant between train and test data? ∼ We assume that Ptrain and Ptest produce data about the same phenomena under different experimental conditions, circumstances, or environments ∼ To succeed at the test environment, we observe multiple training environments and − learn what is invariant across environments − discard what is specific to each environment ∼ There is a causal justification for proceeding this way!
  • 40. Functional causal models A common tool to describe causal structures is the one of Functional Causal Model (FCM) X1 X2 X3X4 Y X1 ← f1(N1) X2 ← f2(X1, X3, N2) X3 ← f3(X1, N3) // X1 causes X3 X4 ← f4(X1, N4) Y ← fy(X2, X3, Ny) Ni ∼ P(N) FCMs are compositional and allow counterfactual reasoning FCMs are generative: observing their eqs produces the observational distribution P(X, Y ) We can also intervene the FCM eqs to produce interventional distributions ˜P(X, Y )! ∼ Each intervention produces one environment (distribution) of the phenomena (FCM) of interest!
  • 41. Functional causal models One FCM = multiple interventions/distributions/environments P1 train(X, Y ) ∼ X1 X2 X3X4 Y X1 = f1(N1) X2 = f2(X1, X3, N2) X3= 1.5 X4 = f4(X1, N4) Y = fy(X2, X3, Ny) Ni ∼ P(N)
  • 42. Functional causal models One FCM = multiple interventions/distributions/environments P2 train(X, Y ) ∼ X1 X2 X3X4 Y X1∼ N(0, 1) X2 = f2(X1, X3, N2) X3 = f3(X1, N3) X4 = f4(X1, N4) Y = fy(X2, X3, Ny) Ni ∼ P(N)
  • 43. Functional causal models One FCM = multiple interventions/distributions/environments P3 train(X, Y ) ∼ X1 X2 X3X4 Y X1 = f1(N1) X2= f2(X1, X3, N2) + U(−10, 10) X3 = f3(X1, N3) X4 = f4(X1, N4) Y = fy(X2, X3, Ny) Ni ∼ P(N)
  • 44. Functional causal models X1 X2 X3X4 Y X1 = f1(N1) X2 = f2(X1, X3, N2) X3 = f3(X1, N3) X4 = f4(X1, N4) Y= fy(X2, X3, Ny) Ni ∼ P(N) If mechanisms are autonomous, and no intervention disturbs the conditional expectation of the target causal equation: − the causal conditional distribution E(Y |X2, X3) remains invariant − the non-causal conditional distribution E(Y |X) may vary wildly! This reveals the link between invariances across environments and causal structures ∼ How can we find invariant causal predictors?
  • 45. A simple example: X → Y → Z For all environments e ∈ R: Xe ← N(0, e), Y e ← Xe + N(0, e) Ze ← Y e + N(0, 1). The task is to predict Y e given (Xe , Ze ) for unknown test e. We have three options: E[Y e |Xe = x] = x, E[Y e |Ze = z] = 2e 2e + 1 z, E[Y e |Xe = x, Ze = z] = 1 e + 1 x + e e + 1 z The causal predictor based on x is invariant! The state-of-the-art (Ganin et al., 2016; Peters et al., 2016) fails at this simple example
  • 46. Our proposal Find a feature representation that leads to the same optimal classifier across environments. ∼ Let we ϕ be the optimal classifier for environment e, when using the featurizer ϕ: we ϕ = arg min w RP e (w ◦ ϕ), where RP e (f) = E(x,y)∼P e [ Error(f(x), y) ] . Measure classifier discrepancy: ∥we ϕ − we′ ϕ ∥P = ∫ (we ϕ(ϕ(x)) − we′ ϕ (ϕ(x)))2 dP(X) Let ¯w = 1 e ∑ e we ϕ. Then, our new learning objective is: arg min ϕ ∑ e RP e ( ¯w ◦ ϕ) + λ ∑ e,e′̸=e ∥we ϕ − we′ ϕ ∥P e (Arjovsky et al., 2018)
  • 47. An approximation to our proposal C(ϕ) = ∑ e RP e ( ¯w ◦ ϕ) + λ ∑ e,e′̸=e ∥we ϕ − we′ ϕ ∥P e is an intractable bi-level optimization problem, since we ϕ is an optimization problem itself We approximate the interactions between the optimization problems using unrolled gradients ∼ 1. Initialize at random ϕ and we ϕ, for all e 1.1 Update we ϕ ← Gradient(RP e , we ϕ) using one step and fixed ϕ, for all e 1.2 Update me ϕ ← Gradient(RP e , we ϕ) using k steps and fixed ϕ, for all e 1.3 Update ϕ ← Gradient(C, me ϕ) using one step and fixed me ϕ 2. Return ( 1 e ∑ e we ϕ ) ◦ ϕ (Arjovsky et al., 2018)
  • 48. First results Empirical risk minimization: Causal risk minimization: ∼ Implications to fairness? Partitions of one dataset? Theory?
  • 49. Multiple environments in the big picture setup training test generative learning U1 1 ∅ unsupervised learning U1 1 U1 2 supervised learning L1 1 U1 1 semi-supervised learning L1 1U1 1 U1 2 transductive learning L1 1U1 1 U1 1 multitask learning L1 1L2 1 U1 2 U2 2 domain adaptation L1 1U2 1 U2 2 transfer learning U1 1 L2 1 U2 1 continual learning L1 1, . . . , L∞ 1 U1 1 , . . . , U∞ 1 multi-environment learning L1 1L2 1 U3 1 U4 1 − Li j: labeled dataset number j drawn from distribution i − Ui j : unlabeled dataset number j drawn from distribution i
  • 50. Second conclusion Prediction rules based on stable correlations across environments are likely to be causal 1 1I call this the principle of causal concentration.
  • 51. Outline What’s wrong with machine learning? A causal proposal Searching for causality I: observational data Searching for causality II: multiple environments Conclusion
  • 52. Finally: from machine learning to artificial intelligence AIs will be world simulators that will − align with the causal outcomes in the world, − perform robustly across diverse environments, − interrogate composable autonomous mechanisms to extrapolate, − allow to imagine multiple futures given uncertainty about a situation, − enable counterfactual reasoning for extreme generalization These causal desiderata are out of reach for current machine learning systems. Let’s get to it! ∼ Thanks!
  • 53. References I Martin Arjovsky, Leon Bottou, and David Lopez-Paz. Learning invariant causal rules across environments. In preparation, 2018. Leon Bottou, Martin Arjovsky, David Lopez-Paz, and Maxime Oquab. Geometrical insights for implicit generative modeling. In Braverman Readings in Machine Learning. Key Ideas from Inception to Current State. Springer, 2018. Povilas Daniusis, Dominik Janzing, Joris Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Sch¨olkopf. Inferring deterministic causal relations. In UAI, 2010. Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Fran¸cois Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. JMLR, 2016. O. Goudet, D. Kalainathan, P. Caillou, I. Guyon, D. Lopez-Paz, and M. Sebag. Causal Generative Neural Networks. arXiv, 2017. Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. arXiv, 2017. Allan Jabri, Armand Joulin, and Laurens van der Maaten. Revisiting visual question answering baselines. In ECCV, 2016. D. Kalainathan, O. Goudet, I. Guyon, D. Lopez-Paz, and M. Sebag. SAM: Structural Agnostic Model, Causal Discovery and Penalized Adversarial Learning. arXiv, 2018. Anna Klimovskaia, Leon Bottou, David Lopez-Paz, and Maximilian Nickel. Poincar maps recover continuous hierarchies in single-celldata. In preparation, 2018. David Lopez-Paz and Maxime Oquab. Revisiting classifier two-sample tests. ICLR, 2016. David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Sch¨olkopf, and L´eon Bottou. Discovering causal signals in images. CVPR, 2017. Franz H. Messerli. Chocolate consumption, cognitive function, and nobel laureates. New England Journal of Medicine, 2012. Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Sch¨olkopf. Distinguishing cause from effect using observational data: methods and benchmarks. JMLR, 2014. Judea Pearl. Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv, 2018. Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Sch¨olkopf. Causal discovery with continuous additive noise models. JMLR, 2014. Jonas Peters, Peter B¨uhlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society, 2016. Hans Reichenbach. The direction of time. Dover, 1956. Mateo Rojas-Carulla, Marco Baroni, and David Lopez-Paz. Causal discovery using proxy variables. In preparation, 2017. A. Rosenfeld, R. Zemel, and J. K. Tsotsos. The Elephant in the Room. arXiv, 2018.
  • 54. References II David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 2016. Oliver Stegle, Dominik Janzing, Kun Zhang, Joris M Mooij, and Bernhard Sch¨olkopf. Probabilistic latent variable models for distinguishing between cause and effect. In NIPS. 2010. Pierre Stock and Moustapha Cisse. Convnets and imagenet beyond accuracy: Explanations, bias detection, adversarial examples and model criticism. arXiv, 2017. B. L. Sturm. A simple method to determine if a music information retrieval system is a “horse”. IEEE Transactions on Multimedia, 2014. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. ICLR, 2013. James Woodward. Making things happen: A theory of causal explanation. Oxford university press, 2005.