Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Iussp2005 Presentation1
1. Statistical Modelling and Causality Federica Russo*, Michel Mouchart**, Michel Ghins*, Guillaume Wunsch*** * Institut Supérieur de Philosophie, Université Catholique de Louvain ** Institut de Statistique, Université Catholique de Louvain *** Institut de Démographie, Université Catholique de Louvain
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Notes de l'éditeur
Causality has been debated both in philosophy and science for long time. However, discussions have been held quite independently An objective of this paper is to restore a fruitful dialogue between philosophy ad science. Here, address the question: to what extent can a statistical model say something about causal relations among variables? We attempt an answer by analyzing a special class of statistical models, i.e. structural models. Take home message: from a statistical viewpoint, causality can be operationally defined in terms of exogeneity. A closest epistemological analysis of structural models reveals the fundamental role of assumptions, background knowledge and h-d methodology
Main message: We espouse a moderate version of scientific realism, according to which models grant cognitive access to (at least some) unobservable aspects of reality. Modelling consists in abstracting, constructing a simplified representation of a complex reality. The purpose of the model is NOT to be true, BUT to be useful.
The aim is to acquire causal knowledge; in order to do that we try to make sense of observations. However, collecting data is problematic under several respects What we decide to observe depends on research questions Erroneous data (voluntary – non voluntary errors) Time ordering Definition of abstract concepts …
I’m not going to teach statistics, I’ll suppose familiarity with technicalities. Just emphasize a couple of conceptual issues. We consider statistical models to be stochastic representations of the world. Error term represents what is not explained by the model. Data can be analyzed as if they were a realization of a family of distributions M A model is also made of assumptions
Statistical inference is concerned with 2 aspects: induction: drawing conclusions about what has not been observed from what has been observed Learning-by-observing: learning about some aspect of interest; accumulating information as observations accumulate Structural models make statistical inference operational and meaningful Structural means a representation of the world that is stable under a large class of interventions. Structural modelling means capturing an underlying (=causal) structure of the world.
Informally and very briefly: Exogenous variable = variable for which the mechanism explaining this variable does not give any information on the mechanism of interest. Endogenous variable = variable for which the mechanism explaining this variable is of interest. And the exogenous variable participate in the explanation of the mechanism of the endogenous variable A causal variable is an exogenous variable in a structural conditional model
Consequences: Causality is internal to the model Exogeneity is an operational concept of causality However, other features of structural models grant causality. We distinguish temporal and atemporal aspects. These are assumptions about: determinism, recursiveness, covariate sufficiency, no confounding, invariance, causal asymmetry, direction of time …
Causality is internal to a model BUT we do not deduce causes from correlations. A hypothetico-deductive methodology is employed in case we have at our disposal enough well confirmed theories and background knowledge to formulate a prior causal hypothesis. Structural models are hypothetico-deductive models, for which empirical testing is performed through two stages: (i) prior theorizing of out-of-sample information, including in particular the selection of variables deemed to be of interest, the formulation of a causal hypothesis (also called the conceptual hypothesis), etc. ; (ii) iteratively: a. building the statistical model; b. testing the adequacy between the model and the data to accept the empirical validity or non validity of the causal hypothesis.
Why the problem arise: causal conclusions drawn from statistical models concern populations as well as individuals, although probability distributions and their parameters are typically defined relative to the population, Populations are made of individuals So, we distinguish 2 levels of causation: population-level and individual-level Mention example: smoking and lung cancer Methodological issue: what the causal ( i.e. exogenous) variables are and whether it is possible at all to provide a sufficient list, and what mechanisms operate among the variables deemed to be causal. These two tasks are difficult to achieve because of heterogeneity of individuals in the same population. Practical issue: Can a physician decide whether to prescribe a treatment or not on the basis of a causal model? Behind the practical issue hides the epistemological one: the physician’s decision depends in the relationships between causality at the population level and at the individual level what we discover about the average relation between smoking and lung cancer, i.e. at the population level, can guide causal attribution in the case of Harry through a simple tool of probabilistic reasoning, namely Bayes’ theorem. In fact, Bayes’ theorem allows us to calculate the posterior probability of the cause for a given individual, provided that the population risk is interpreted as a prior probability for this individual.
Structural models are characterized by parameters that are stable over a large class of interventions; in the marginal-conditional decomposition, the conditional part describes the data generating process; this part is structural, i.e. causal. Causality is here defined in terms of exogeneity However, we have to go beyond the operational concept of exogeneity. A more complex and rich concept of causality requires to acknowledge the role of assumptions, of background knowledge and of the H-D methodology Within these structural models we are allowed to formulate causal statements, i.e. causality is internal to the model Causality is relative to a structural model, but this is not to deny causality in the world. Rather, this is to emphasize that causal knowledge depends on structural models that mediate epistemic access to causal relations.