SlideShare une entreprise Scribd logo
1  sur  48
0
Philosophical Interventions in the
Statistics Wars
Deborah G. Mayo
Virginia Tech
Philosophy in Science: Can Philosophers of
Science Contribute to Science?
PSA 2021 November 13, 2-4 pm
1
“A Statistical Scientist Meets a
Philosopher of Science”
Sir David Cox: “Deborah, in some fields
foundations do not seem very important, but we
both think foundations of statistical inference are
important; why do you think that is?”
Mayo: “…in statistics …we invariably cross into
philosophical questions about empirical knowledge
and inductive inference.” (Cox and Mayo 2011)
Some call statistics “applied philosophy of Science”
(Kempthorne 1976)
2
Statistics  Philosophy of science
Most of my interactions with statistics have been
drawing out insights from stat:
(1) To solve philosophical problems about
inductive inference, evidence, experiment;
(2) To answer knotty metamethodological
questions: When (if ever) is it legitimate to use
the ‘same’ data to construct and test a
hypothesis?
3
Philosophy of Science  Statistics
• In the last decade I’m more likely to be intervening
in stat—in the sense of this session: PinS
• A central job for philosophers of science: minister
to conceptual and logical problems of sciences
• Especially when widely used methods (e.g.,
statistical significance tests) are said to be causing
a crisis (and should be “abandoned” or “retired”)
4
Long-standing philosophical
controversy on probability
Frequentist (error statisticians): to control and
assess the relative frequency of misinterpretations
of data—error probabilities
(e.g., P-values, confidence intervals, randomization,
resampling)
Bayesians (and other probabilists): to assign
comparative degrees of belief or support in claims
(e.g., Bayes factors, Bayesian posterior probabilities)
5
• Wars between frequentists and Bayesians
have been contentious, everyone wants to
believe we are long past them.
• Long standing battles still simmer below the
surface
6
My first type of intervention:
• Illuminate the debates, within and between rival stat
tribes, in relation to today’s problems
• What’s behind the drumbeat that there’s a statistical
crisis in science?
• High powered methods enable arriving at well-
fitting models and impressive looking effects even
if they’re not warranted.
• I set sail with a simple tool: if little or nothing has
been done to rule out flaws in inferring a claim, we
do not have evidence for it.
7
A claim is warranted to the extent
it passes severely
8
• We have evidence for a claim only to the
extent that it has been subjected to and
passes a test that would probably have found
it flawed or specifiably false, just if it is
• This probability is the stringency or severity
with which it has passed the test
Second type of intervention:
Statistical inference as severe testing
• Reformulate frequentist error statistical tools
• Probability arises (in scientific inference) to assess and
control how capable methods are at uncovering and
avoiding erroneous interpretations of data (Probativism)
• Excavation tool: Holds for any kind of inference; you
needn’t accept this philosophy to use it to get beyond
today’s statistical wars and scrutinize reforms
9
Third type of intervention: scrutinize
proposed reforms growing out of the
“replication crisis”
• Several proposed reforms are welcome:
preregistration, avoidance of cookbook statistics,
calls for more replication research
• Others are quite radical, and even obstruct practices
known to improve on replication.
10
Consider statistical significance tests
(frequentist)
Significance tests (R.A. Fisher) are a small part of an
error statistical methodology
“…to test the conformity of the particular data under
analysis with H0 in some respect….”
…the p-value: the probability of getting an even
larger value of t0bs assuming background variability
or noise (Mayo and Cox 2006, 81)
11
Testing reasoning, as I see it
• If even larger differences than t0bs occur fairly frequently
under H0 (i.e., P-value is not small), there’s scarcely
evidence of incompatibility with H0
• Small P-value indicates some underlying discrepancy
from H0 because very probably (1–P) you would
have seen a smaller difference than t0bs were H0 true.
• Even if the small P-value is valid, it isn’t evidence of a
scientific conclusion H*
Stat-Sub fallacy H1 => H*
12
Neyman-Pearson (N-P) put
Fisherian tests on firmer footing
(1933):
Introduces alternative hypotheses H0, H1
H0: μ ≤ 0 vs. H1: μ > 0
• Constrains tests by requiring control of both Type I error
(erroneously rejecting) and Type II error (erroneously
failing to reject) H0, and power
(Neyman also developed confidence interval estimation
at the same time)
13
N-P tests tools for optimal
performance:
• Their success in optimal control of error
probabilities gives a new paradigm for statistics
• Also encouraged viewing tests as “accept/reject”
rules more apt for industrial quality control, or
high throughput screening, than science
• Fisher, later in life, criticized N-P for turning “his”
tests into acceptance sampling tools —I learned
later it was mostly in-fighting
14
• Can we keep the best from Fisherian and N-P tests
without an ‘inconsistent hybrid” (Gigerenzer)?
• This fueled my second intervention (Mayo 1991, 1996)
later developed with econometrician Aris Spanos in
2000 and statistician David Cox in 2003
• “Our goal is to identify a key principle of evidence by
which hypothetical error probabilities may be used for
inductive inference.” (Mayo and Cox 2006)
• Mathematically Fisher and N-P are nearly identical—it is
an interpretation or philosophy that is needed
15
Both Fisher & N-P: it’s easy to lie with
biasing selection effects
• Sufficient finagling—cherry-picking, significance
seeking, multiple testing, post-data subgroups, trying
and trying again—may practically guarantee a
preferred claim H gets support, even if it’s unwarranted
by evidence
• Such a test fails a minimal requirement for a stringent
or severe test (P-value is invalidated)
16
Key to solving a central problem
• Why is reliable performance relevant for a specific
inference?
• Ask yourself: What bothers you with selective
reporting, cherry picking, stopping when the data
look good, P-hacking.
17
18
• Not a problem about long-run performance—
• It’s that we can’t say the test did its job in the
case at hand: give “a first line of defense against
being fooled by randomness” (Benjamini 2016)
Inferential construal of error
probabilities
• Use error probabilities to assess capabilities of tools to
probe various flaws (Probativism)
• They are what Popper call’s “methodological
probabilities”
• “Severe Testing as a Basic Concept in a Neyman-Pearson
Philosophy of Induction” (Mayo and Spanos 2006)
• ”Frequentist theory as an account of inductive inference” (Mayo
and Cox 2006)
19
20
Popper vs logics of induction/
confirmation
Severity was Popper’s term, and the debate between
Popperian falsificationism and inductive logics of
confirmation/ support parallel those in statistics.
Popper: claim C is “corroborated” to the extent C
passes a severe test (one that probably would have
detected C’s falsity, if false).
Comparative logic of support
• Ian Hacking (1965) “Law of Likelihood”:
x support hypothesis H0 less well than H1 if,
Pr(x;H0) < Pr(x;H1)
A problem is:
• Any hypothesis that perfectly fits the data is
maximally likely (even if data-dredged)
• “there always is such a rival hypothesis viz., that
things just had to turn out the way they actually
did” (Barnard 1972, 129)
21
Error probabilities are
“one level above” a fit measure:
• Pr(H0 is less well supported than H1; H0 ) is high
for some H1 or other
“to fix a limit between ‘small’ and ‘large’ values of
[the likelihood ratio] we must know how often such
values appear when we deal with a true
hypothesis.” (Pearson and Neyman 1967, 106)
22
“There is No Such Thing as a Logic
of Statistical Inference”
• Hacking retracts his Law of Likelihood (LL), (1972,
1980)
• And retracts his earlier rejections that Neyman–
Pearson statistics is inferential.
“I now believe that Neyman, Peirce, and
Braithwaite were on the right lines to follow in the
analysis of inductive arguments”
(Hacking 1980, 141)
23
Likelihood Principle: what counts
as evidence?
A pervasive view is that all the evidence is
contained in the ratio of likelihoods:
Pr(x;H0) / Pr(x;H1) likelihood principle (LP)
On the LP (followed by strict Bayesians):
“Sampling distributions, significance levels,
power, all depend on something more [than
the likelihood function]–something that is
irrelevant in Bayesian inference–namely the
sample space” (Lindley 1971, 436)
24
Bayesians Howson and Urbach
• They say a significance test is precluded from
giving judgments about empirical support
• “[it] depends not only on the outcome that a trial
produced, but also on the outcomes that it could
have produced but did not. …determined by certain
private intentions of the experimenters, embodying
their stopping rule.” (1993 p. 212)
• Whether error probabilities matter turns on your
methodology being able to pick up on them.
25
26
In testing the mean of a
standard normal distribution
• So the frequentist needs to know the stopping rule
For a (strict) Bayesian:
“It seems very strange that a frequentist could not
analyze a given set of data…if the stopping rule is
not given….Data should be able to speak for itself.”
(Berger and Wolpert, The Likelihood Principle 1988,
78)
27
Radiation oncologists look to phil
science: “Why do we disagree about
clinical trials?” (ASTRO 2021)
In a case we considered, Bayesian researchers:
“The [regulatory] requirement of type I error control for
Bayesian adaptive designs causes them to lose many
of their philosophical advantages, such as compliance
with the likelihood principle [which does not require
adjusting]” (Ryan et al. 2020).
They admit “the type I error was inflated in the [trials]
..without adjustments to account for multiplicity”.
• No wonder they disagree, and it turns partly on the
likelihood principle. (LP)
28
Bayesians may block implausible
inferences
• With a low prior degree of belief on H (e.g., real
effect), the Bayesian can block inferring H
• Can work in some cases
Concerns
• Additional source of flexibility, priors as well as
biasing selection effects
• Doesn’t show what researchers had done wrong—
it’s the multiple testing, data-dredging
• The believability of data-dredged hypotheses is
what makes them so seductive
• Claims can be highly probable (in any sense) while
poorly probed
30
Family feuds within the Bayesian
school: default, objective priors:
• Most Bayesian practitioners (last decade) look for
non-subjective prior probabilities
• “Default” priors are supposed to prevent prior beliefs
from influencing the posteriors–data dominant
31
How should we interpret them?
“By definition, ‘non-subjective’ prior distributions are
not intended to describe personal beliefs, and in
most cases, they are not even proper probability
distributions .. . . (Bernardo 1997, pp. 159–60)
• No agreement on rival systems for default/non-
subjective priors
(invariance, maximum entropy, maximizing missing
information, matching (Kass and Wasserman 1996))
32
There may be ways to combine Bayesian
and error statistical accounts
(Gelman: Falsificationist Bayesian; Shalizi: error statistician)
“[C]rucial parts of Bayesian data analysis, … can be
understood as ‘error probes’ in Mayo’s sense”
“[W]hat we are advocating, then, is what Cox and Hinkley
(1974) call ‘pure significance testing’, in which certain of
the model’s implications are compared directly to the
data.” (Gelman and Shalizi 2013, 10, 20).
• Gelman was at a session on significance testing controversies at the 2016
PSA with Gigerenzer, and Glymour
• Can’t also champion “abandoning statistical significance”
33
Now we get to scrutinizing proposed
reforms
34
No Threshold view: Don’t say
‘significance’, don’t use P-value
thresholds
• In 2019, executive director of the American Statistical
Association (ASA), Ron Wasserstein, (and 2 co-
authors), announce: "declarations of ‘statistical
significance’ be abandoned"
• “Don’t say “significance”, don’t use P-value thresholds
(e.g., .05, .01, .005)
• John Ioannidis invited me and Andrew Gelman to write
opposing editorials on the “no threshold view”
(European Journal of Clinical Investigation)
Mine was “P-value thresholds: forfeit at your peril”
35
• To be fair, many who signed on to the “no threshold view”
think by removing P-value thresholds, researchers lose an
incentive to data dredge and multiple test and otherwise
exploit researcher flexibility
• I argue banning the use of P-value thresholds in
interpreting data does not diminish but rather exacerbates
data-dredging
36
• In a world without predesignated thresholds, it would
be hard to hold the data dredgers accountable for
reporting a nominally small P-value through
ransacking, data dredging, trying and trying again.
• What distinguishes genuine P-values from invalid
ones is that they meet a prespecified error probability.
• No thresholds, no tests.
• We agree the actual P-value should be reported (as
all the founders of tests recommended)
37
38
Problems are avoided by reformulating
tests with a discrepancy γ from H0
 Instead of a binary cut-off (significant or not) the
particular outcome is used to infer discrepancies that
are or are not warranted
 In a nutshell: one tests several discrepancies from a
test hypothesis and infers those well or poorly
warranted
 E.g., With non-significant results, we set an upper
bound (e.g., any discrepancy from H0 is less than γ)
Final Remarks: to intervene in
statistics battles you need to ask:
• How do they use probability?
(probabilism, performance, probativism (severe testing))
• What’s their notion of evidence?
(error probability principle, likelihood principle)
39
Intervening in today’s stat policy
reforms requires chutzpah
• Things have gotten so political, sometimes an
outsider status can help with acrimonious battles
with thought leaders in statistics.
• To give an update: a Task Force of 14 statisticians
was appointed by the ASA President in 2019 “to
address concerns that [the no threshold view] might
be mistakenly interpreted as official ASA policy”
(Benjamini 2021)
40
• “the use of P -values and significance testing, properly
applied and interpreted, are important tools that should not
be abandoned” (Benjamini et al. 2021)
• Instead, we need to confront the fact that basic stat
concepts are more confused than ever (in medicine,
economics, law, psychology, climate science, social
science etc.)
• I was glad to see the morning's session* organized by
members of the 2019 Summer Seminar in Phil Stat (Aris
Spanos and I ran)
• I hope more philosophers of science enter the 2-way street
*Current Debates on Statistical Modeling and Inference
41
Phil Sci Stat Sci
42
43
(FEV) Frequentist Principle of
Evidence: Mayo and Cox (2006)
(SEV): Mayo 1991, 1996, 2018; Mayo
and Spanos (2006)
FEV/SEV Small P-value: indicates discrepancy γ from
H0, only if, there is a high probability the test would
have resulted in a larger P-value were a discrepancy
as large as γ absent.
FEV/SEV Moderate or large P-value: indicates the
absence of a discrepancy γ from H0, only if there is
a high probability the test would have given a
worse fit with H0 (i.e., a smaller P-value) were a
discrepancy γ present.
44
References
• Barnard, G. (1972). The logic of statistical inference (Review of “The Logic of Statistical Inference” by Ian
Hacking). British Journal for the Philosophy of Science 23(2), 123–32.
• Benjamini, Y., De Veaux, R., Efron, B., et al. (2021). The ASA President’s task force statement on
statistical significance and replicability. The Annals of Applied Statistics. (Online June 20, 2021.)
• Berger, J. O. and Wolpert, R. (1988). The Likelihood Principle, 2nd ed. Vol. 6 Lecture Notes-Monograph
Series. Hayward, CA: Institute of Mathematical Statistics.
• Cox, D. R., and Mayo, D. G. (2010). “Objectivity and Conditionality in Frequentist Inference.” In Error and
Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality
of Science, edited by Deborah G. Mayo and Aris Spanos, 276–304. Cambridge: Cambridge University
Press.
• Cox, D. and Mayo, D. (2011). “A Statistical Scientist Meets a Philosopher of Science: A Conversation
between Sir David Cox and Deborah Mayo”, in Rationality, Markets and Morals (RMM) 2, 103–14.
• Fisher, R. A. (1935a). The Design of Experiments. Oxford: Oxford University Press.
• Gelman, A. and Shalizi, C. (2013). Philosophy and the Practice of Bayesian Statistics and Rejoinder,
British Journal of Mathematical and Statistical Psychology 66(1), 8–38; 76–80.
• Giere, R. (1976). Empirical probability, objective statistical methods, and scientific inquiry. In Foundations
of probability theory, statistical inference and statistical theories of science, vol. 2, edited by W. 1. Harper
and C. A. Hooker, 63-101. Dordrecht, The Netherlands: D. Reidel.
• Hacking, I. (1965). Logic of Statistical Inference. Cambridge: Cambridge University Press.
• Hacking (1972). Likelihood. British Journal for the Philosophy of Science 23, l32-3 7.
• Hacking, I. (1980). The theory of probable inference: Neyman, Peirce and Braithwaite. In Mellor, D.
(ed.), Science, Belief and Behavior: Essays in Honour of R. B. Braithwaite, Cambridge: Cambridge
University Press, pp. 141–60. 45
• Harper, W. 1., and C. A. Hooker, eds. 1976. Foundations of probability theory, statistical inference and
statistical theories of science. Vol. 2. Dordrecht, The Netherlands: D. Reidel.
• Howson, C. & Urbach, P. (1993). Scientific Reasoning: The Bayesian Approach. LaSalle, IL: Open Court.
• Kass, R. & Wasserman, L. (1996). The Selection of Prior Distributions by Formal Rules. Journal of the
American Statistical Association 91, 1343–70.
• Kempthorne, O. (1976). Statistics and the Philosophers, in Harper, W. and Hooker, C. (eds.), Foundations of
Probability Theory, Statistical Inference and Statistical Theories of Science, Volume II. 273–314. Boston, MA:
D. Reidel.
• Lindley, D. V. (1971). The Estimation of Many Parameters in Godambe, V. and Sprott, D. (eds.), Foundations
of Statistical Inference 435–455. Toronto: Holt, Rinehart and Winston.
• Mayo, D. (1991). Novel Evidence and Severe Tests. Philosophy of Science 58(4), 523–52.
• Mayo, D. (1996). Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press.
• Mayo, D. (2014). On the Birnbaum Argument for the Strong Likelihood Principle (with discussion), Statistical
Science 29(2), 227–39; 261–6.
• Mayo, D. (2016). Don’t Throw Out the Error Control Baby with the Bad Statistics Bathwater: A Commentary
on Wasserstein, R. L. and Lazar, N. A. 2016, “The ASA’s Statement on p-Values: Context, Process, and
Purpose. The American Statistician 70(2) (supplemental materials).
• Mayo, D. (2018). Statistical inference as severe testing: How to get beyond the statistics wars. Cambridge:
Cambridge University Press.
• Mayo, D. (forthcoming). The Statistics Wars and Intellectual Conflicts of Interest (editorial). Conservation
Biology.
46
• Mayo, D. & Cox, D. (2006). Frequentist statistics as a theory of inductive inference. In Rojo, J. (ed.),
Optimality: The Second Erich L. Lehmann Symposium, Lecture Notes-Monograph series, Institute of
Mathematical Statistics (IMS), 49, pp. 77–97. (Reprinted 2010 in Mayo, D. and Spanos, A. (eds.), pp. 247–
75.)
• Mayo, D. & Hand, D. (under review). Statistical Significance Tests: Practicing damaging science, or damaging
scientific practice? In Kao, M., Shech, E., & Mayo, D. Synthese (Special Issue: Recent Issues in Philosophy
of Statistics: Evidence, Testing, and Applications ).
• Mayo, D. & Spanos, A. (2006). Severe testing as a basic concept in a Neyman–Pearson philosophy of
induction. British Journal for the Philosophy of Science 57(2), 323–57.
• Mayo, D. G., and A. Spanos (2011). “Error Statistics.” In Philosophy of Statistics, edited by Prasanta S.
Bandyopadhyay and Malcolm R. Forster, 7:152–198. Handbook of the Philosophy of Science. The
Netherlands: Elsevier.
• Musgrave, A. (1974). ‘Logical versus Historical Theories of Confirmation’, The British Journal for the
Philosophy of Science 25(1), 1–23.
• Neyman J. & Pearson, E. (1967). On the problem of the most efficient tests of statistical hypotheses. In Joint
statistical papers, 140-85 (Berkeley: University of California Press). First published in Philosophical
Transactions of the Royal Society (A)(1933):231, 289-337.
• Popper, K. (1959). The Logic of Scientific Discovery. London, New York: Routledge.
• Simmons, J., Nelson, L., & Simonsohn, U. (2012). A 21 word solution. Dialogue: The Official Newsletter of
the Society for Personality and Social Psychology 26(2), 4–7.
• Wasserstein, R. & Lazar, N. (2016). The ASA’s statement on p-values: Context, process and purpose (and
supplemental materials). The American Statistician, 70(2), 129-133.
• Wasserstein, R., Schirm, A,. & Lazar, N. (2019). Moving to a world beyond “p < 0.05” (Editorial). The
American Statistician 73(S1), 1–19. https://doi.org/10.1080/00031305.2019.1583913
47

Contenu connexe

Tendances

D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1jemille6
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performancejemille6
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
 
Mayo: Day #2 slides
Mayo: Day #2 slidesMayo: Day #2 slides
Mayo: Day #2 slidesjemille6
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...jemille6
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Sciencejemille6
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correctionjemille6
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversyjemille6
 
Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13jemille6
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2jemille6
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy jemille6
 
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"jemille6
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talkjemille6
 
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019jemille6
 
Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16jemille6
 
Philosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of StatisticsPhilosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of Statisticsjemille6
 
Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma jemille6
 
Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively jemille6
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaperjemille6
 

Tendances (20)

D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
Mayo: Day #2 slides
Mayo: Day #2 slidesMayo: Day #2 slides
Mayo: Day #2 slides
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correction
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
 
Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy
 
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
 
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
 
Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16
 
Philosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of StatisticsPhilosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of Statistics
 
Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma
 
Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaper
 

Similaire à Mayod@psa 21(na)

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsificationjemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
The Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and CasualtiesThe Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and Casualtiesjemille6
 
D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500jemille6
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learningjemille6
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualtiesjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)jemille6
 
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”jemille6
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)jemille6
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)jemille6
 
Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1jemille6
 
Does preregistration improve the interpretability and credibility of research...
Does preregistration improve the interpretability and credibility of research...Does preregistration improve the interpretability and credibility of research...
Does preregistration improve the interpretability and credibility of research...Mark Rubin
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingpraveen3030
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severityjemille6
 
Oom not doom a novel method for improving psychological science, Bradley Woods
Oom not doom   a novel method for improving psychological science, Bradley WoodsOom not doom   a novel method for improving psychological science, Bradley Woods
Oom not doom a novel method for improving psychological science, Bradley WoodsNZ Psychological Society
 

Similaire à Mayod@psa 21(na) (20)

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
The Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and CasualtiesThe Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and Casualties
 
D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learning
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualties
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
 
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)
 
Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1
 
Does preregistration improve the interpretability and credibility of research...
Does preregistration improve the interpretability and credibility of research...Does preregistration improve the interpretability and credibility of research...
Does preregistration improve the interpretability and credibility of research...
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
Oom not doom a novel method for improving psychological science, Bradley Woods
Oom not doom   a novel method for improving psychological science, Bradley WoodsOom not doom   a novel method for improving psychological science, Bradley Woods
Oom not doom a novel method for improving psychological science, Bradley Woods
 

Dernier

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 

Dernier (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 

Mayod@psa 21(na)

  • 1. 0 Philosophical Interventions in the Statistics Wars Deborah G. Mayo Virginia Tech Philosophy in Science: Can Philosophers of Science Contribute to Science? PSA 2021 November 13, 2-4 pm
  • 2. 1 “A Statistical Scientist Meets a Philosopher of Science” Sir David Cox: “Deborah, in some fields foundations do not seem very important, but we both think foundations of statistical inference are important; why do you think that is?” Mayo: “…in statistics …we invariably cross into philosophical questions about empirical knowledge and inductive inference.” (Cox and Mayo 2011) Some call statistics “applied philosophy of Science” (Kempthorne 1976)
  • 3. 2 Statistics  Philosophy of science Most of my interactions with statistics have been drawing out insights from stat: (1) To solve philosophical problems about inductive inference, evidence, experiment; (2) To answer knotty metamethodological questions: When (if ever) is it legitimate to use the ‘same’ data to construct and test a hypothesis?
  • 4. 3 Philosophy of Science  Statistics • In the last decade I’m more likely to be intervening in stat—in the sense of this session: PinS • A central job for philosophers of science: minister to conceptual and logical problems of sciences • Especially when widely used methods (e.g., statistical significance tests) are said to be causing a crisis (and should be “abandoned” or “retired”)
  • 5. 4 Long-standing philosophical controversy on probability Frequentist (error statisticians): to control and assess the relative frequency of misinterpretations of data—error probabilities (e.g., P-values, confidence intervals, randomization, resampling) Bayesians (and other probabilists): to assign comparative degrees of belief or support in claims (e.g., Bayes factors, Bayesian posterior probabilities)
  • 6. 5 • Wars between frequentists and Bayesians have been contentious, everyone wants to believe we are long past them. • Long standing battles still simmer below the surface
  • 7. 6 My first type of intervention: • Illuminate the debates, within and between rival stat tribes, in relation to today’s problems • What’s behind the drumbeat that there’s a statistical crisis in science?
  • 8. • High powered methods enable arriving at well- fitting models and impressive looking effects even if they’re not warranted. • I set sail with a simple tool: if little or nothing has been done to rule out flaws in inferring a claim, we do not have evidence for it. 7
  • 9. A claim is warranted to the extent it passes severely 8 • We have evidence for a claim only to the extent that it has been subjected to and passes a test that would probably have found it flawed or specifiably false, just if it is • This probability is the stringency or severity with which it has passed the test
  • 10. Second type of intervention: Statistical inference as severe testing • Reformulate frequentist error statistical tools • Probability arises (in scientific inference) to assess and control how capable methods are at uncovering and avoiding erroneous interpretations of data (Probativism) • Excavation tool: Holds for any kind of inference; you needn’t accept this philosophy to use it to get beyond today’s statistical wars and scrutinize reforms 9
  • 11. Third type of intervention: scrutinize proposed reforms growing out of the “replication crisis” • Several proposed reforms are welcome: preregistration, avoidance of cookbook statistics, calls for more replication research • Others are quite radical, and even obstruct practices known to improve on replication. 10
  • 12. Consider statistical significance tests (frequentist) Significance tests (R.A. Fisher) are a small part of an error statistical methodology “…to test the conformity of the particular data under analysis with H0 in some respect….” …the p-value: the probability of getting an even larger value of t0bs assuming background variability or noise (Mayo and Cox 2006, 81) 11
  • 13. Testing reasoning, as I see it • If even larger differences than t0bs occur fairly frequently under H0 (i.e., P-value is not small), there’s scarcely evidence of incompatibility with H0 • Small P-value indicates some underlying discrepancy from H0 because very probably (1–P) you would have seen a smaller difference than t0bs were H0 true. • Even if the small P-value is valid, it isn’t evidence of a scientific conclusion H* Stat-Sub fallacy H1 => H* 12
  • 14. Neyman-Pearson (N-P) put Fisherian tests on firmer footing (1933): Introduces alternative hypotheses H0, H1 H0: μ ≤ 0 vs. H1: μ > 0 • Constrains tests by requiring control of both Type I error (erroneously rejecting) and Type II error (erroneously failing to reject) H0, and power (Neyman also developed confidence interval estimation at the same time) 13
  • 15. N-P tests tools for optimal performance: • Their success in optimal control of error probabilities gives a new paradigm for statistics • Also encouraged viewing tests as “accept/reject” rules more apt for industrial quality control, or high throughput screening, than science • Fisher, later in life, criticized N-P for turning “his” tests into acceptance sampling tools —I learned later it was mostly in-fighting 14
  • 16. • Can we keep the best from Fisherian and N-P tests without an ‘inconsistent hybrid” (Gigerenzer)? • This fueled my second intervention (Mayo 1991, 1996) later developed with econometrician Aris Spanos in 2000 and statistician David Cox in 2003 • “Our goal is to identify a key principle of evidence by which hypothetical error probabilities may be used for inductive inference.” (Mayo and Cox 2006) • Mathematically Fisher and N-P are nearly identical—it is an interpretation or philosophy that is needed 15
  • 17. Both Fisher & N-P: it’s easy to lie with biasing selection effects • Sufficient finagling—cherry-picking, significance seeking, multiple testing, post-data subgroups, trying and trying again—may practically guarantee a preferred claim H gets support, even if it’s unwarranted by evidence • Such a test fails a minimal requirement for a stringent or severe test (P-value is invalidated) 16
  • 18. Key to solving a central problem • Why is reliable performance relevant for a specific inference? • Ask yourself: What bothers you with selective reporting, cherry picking, stopping when the data look good, P-hacking. 17
  • 19. 18 • Not a problem about long-run performance— • It’s that we can’t say the test did its job in the case at hand: give “a first line of defense against being fooled by randomness” (Benjamini 2016)
  • 20. Inferential construal of error probabilities • Use error probabilities to assess capabilities of tools to probe various flaws (Probativism) • They are what Popper call’s “methodological probabilities” • “Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction” (Mayo and Spanos 2006) • ”Frequentist theory as an account of inductive inference” (Mayo and Cox 2006) 19
  • 21. 20 Popper vs logics of induction/ confirmation Severity was Popper’s term, and the debate between Popperian falsificationism and inductive logics of confirmation/ support parallel those in statistics. Popper: claim C is “corroborated” to the extent C passes a severe test (one that probably would have detected C’s falsity, if false).
  • 22. Comparative logic of support • Ian Hacking (1965) “Law of Likelihood”: x support hypothesis H0 less well than H1 if, Pr(x;H0) < Pr(x;H1) A problem is: • Any hypothesis that perfectly fits the data is maximally likely (even if data-dredged) • “there always is such a rival hypothesis viz., that things just had to turn out the way they actually did” (Barnard 1972, 129) 21
  • 23. Error probabilities are “one level above” a fit measure: • Pr(H0 is less well supported than H1; H0 ) is high for some H1 or other “to fix a limit between ‘small’ and ‘large’ values of [the likelihood ratio] we must know how often such values appear when we deal with a true hypothesis.” (Pearson and Neyman 1967, 106) 22
  • 24. “There is No Such Thing as a Logic of Statistical Inference” • Hacking retracts his Law of Likelihood (LL), (1972, 1980) • And retracts his earlier rejections that Neyman– Pearson statistics is inferential. “I now believe that Neyman, Peirce, and Braithwaite were on the right lines to follow in the analysis of inductive arguments” (Hacking 1980, 141) 23
  • 25. Likelihood Principle: what counts as evidence? A pervasive view is that all the evidence is contained in the ratio of likelihoods: Pr(x;H0) / Pr(x;H1) likelihood principle (LP) On the LP (followed by strict Bayesians): “Sampling distributions, significance levels, power, all depend on something more [than the likelihood function]–something that is irrelevant in Bayesian inference–namely the sample space” (Lindley 1971, 436) 24
  • 26. Bayesians Howson and Urbach • They say a significance test is precluded from giving judgments about empirical support • “[it] depends not only on the outcome that a trial produced, but also on the outcomes that it could have produced but did not. …determined by certain private intentions of the experimenters, embodying their stopping rule.” (1993 p. 212) • Whether error probabilities matter turns on your methodology being able to pick up on them. 25
  • 27. 26 In testing the mean of a standard normal distribution
  • 28. • So the frequentist needs to know the stopping rule For a (strict) Bayesian: “It seems very strange that a frequentist could not analyze a given set of data…if the stopping rule is not given….Data should be able to speak for itself.” (Berger and Wolpert, The Likelihood Principle 1988, 78) 27
  • 29. Radiation oncologists look to phil science: “Why do we disagree about clinical trials?” (ASTRO 2021) In a case we considered, Bayesian researchers: “The [regulatory] requirement of type I error control for Bayesian adaptive designs causes them to lose many of their philosophical advantages, such as compliance with the likelihood principle [which does not require adjusting]” (Ryan et al. 2020). They admit “the type I error was inflated in the [trials] ..without adjustments to account for multiplicity”. • No wonder they disagree, and it turns partly on the likelihood principle. (LP) 28
  • 30. Bayesians may block implausible inferences • With a low prior degree of belief on H (e.g., real effect), the Bayesian can block inferring H • Can work in some cases
  • 31. Concerns • Additional source of flexibility, priors as well as biasing selection effects • Doesn’t show what researchers had done wrong— it’s the multiple testing, data-dredging • The believability of data-dredged hypotheses is what makes them so seductive • Claims can be highly probable (in any sense) while poorly probed 30
  • 32. Family feuds within the Bayesian school: default, objective priors: • Most Bayesian practitioners (last decade) look for non-subjective prior probabilities • “Default” priors are supposed to prevent prior beliefs from influencing the posteriors–data dominant 31
  • 33. How should we interpret them? “By definition, ‘non-subjective’ prior distributions are not intended to describe personal beliefs, and in most cases, they are not even proper probability distributions .. . . (Bernardo 1997, pp. 159–60) • No agreement on rival systems for default/non- subjective priors (invariance, maximum entropy, maximizing missing information, matching (Kass and Wasserman 1996)) 32
  • 34. There may be ways to combine Bayesian and error statistical accounts (Gelman: Falsificationist Bayesian; Shalizi: error statistician) “[C]rucial parts of Bayesian data analysis, … can be understood as ‘error probes’ in Mayo’s sense” “[W]hat we are advocating, then, is what Cox and Hinkley (1974) call ‘pure significance testing’, in which certain of the model’s implications are compared directly to the data.” (Gelman and Shalizi 2013, 10, 20). • Gelman was at a session on significance testing controversies at the 2016 PSA with Gigerenzer, and Glymour • Can’t also champion “abandoning statistical significance” 33
  • 35. Now we get to scrutinizing proposed reforms 34
  • 36. No Threshold view: Don’t say ‘significance’, don’t use P-value thresholds • In 2019, executive director of the American Statistical Association (ASA), Ron Wasserstein, (and 2 co- authors), announce: "declarations of ‘statistical significance’ be abandoned" • “Don’t say “significance”, don’t use P-value thresholds (e.g., .05, .01, .005) • John Ioannidis invited me and Andrew Gelman to write opposing editorials on the “no threshold view” (European Journal of Clinical Investigation) Mine was “P-value thresholds: forfeit at your peril” 35
  • 37. • To be fair, many who signed on to the “no threshold view” think by removing P-value thresholds, researchers lose an incentive to data dredge and multiple test and otherwise exploit researcher flexibility • I argue banning the use of P-value thresholds in interpreting data does not diminish but rather exacerbates data-dredging 36
  • 38. • In a world without predesignated thresholds, it would be hard to hold the data dredgers accountable for reporting a nominally small P-value through ransacking, data dredging, trying and trying again. • What distinguishes genuine P-values from invalid ones is that they meet a prespecified error probability. • No thresholds, no tests. • We agree the actual P-value should be reported (as all the founders of tests recommended) 37
  • 39. 38 Problems are avoided by reformulating tests with a discrepancy γ from H0  Instead of a binary cut-off (significant or not) the particular outcome is used to infer discrepancies that are or are not warranted  In a nutshell: one tests several discrepancies from a test hypothesis and infers those well or poorly warranted  E.g., With non-significant results, we set an upper bound (e.g., any discrepancy from H0 is less than γ)
  • 40. Final Remarks: to intervene in statistics battles you need to ask: • How do they use probability? (probabilism, performance, probativism (severe testing)) • What’s their notion of evidence? (error probability principle, likelihood principle) 39
  • 41. Intervening in today’s stat policy reforms requires chutzpah • Things have gotten so political, sometimes an outsider status can help with acrimonious battles with thought leaders in statistics. • To give an update: a Task Force of 14 statisticians was appointed by the ASA President in 2019 “to address concerns that [the no threshold view] might be mistakenly interpreted as official ASA policy” (Benjamini 2021) 40
  • 42. • “the use of P -values and significance testing, properly applied and interpreted, are important tools that should not be abandoned” (Benjamini et al. 2021) • Instead, we need to confront the fact that basic stat concepts are more confused than ever (in medicine, economics, law, psychology, climate science, social science etc.) • I was glad to see the morning's session* organized by members of the 2019 Summer Seminar in Phil Stat (Aris Spanos and I ran) • I hope more philosophers of science enter the 2-way street *Current Debates on Statistical Modeling and Inference 41 Phil Sci Stat Sci
  • 43. 42
  • 44. 43
  • 45. (FEV) Frequentist Principle of Evidence: Mayo and Cox (2006) (SEV): Mayo 1991, 1996, 2018; Mayo and Spanos (2006) FEV/SEV Small P-value: indicates discrepancy γ from H0, only if, there is a high probability the test would have resulted in a larger P-value were a discrepancy as large as γ absent. FEV/SEV Moderate or large P-value: indicates the absence of a discrepancy γ from H0, only if there is a high probability the test would have given a worse fit with H0 (i.e., a smaller P-value) were a discrepancy γ present. 44
  • 46. References • Barnard, G. (1972). The logic of statistical inference (Review of “The Logic of Statistical Inference” by Ian Hacking). British Journal for the Philosophy of Science 23(2), 123–32. • Benjamini, Y., De Veaux, R., Efron, B., et al. (2021). The ASA President’s task force statement on statistical significance and replicability. The Annals of Applied Statistics. (Online June 20, 2021.) • Berger, J. O. and Wolpert, R. (1988). The Likelihood Principle, 2nd ed. Vol. 6 Lecture Notes-Monograph Series. Hayward, CA: Institute of Mathematical Statistics. • Cox, D. R., and Mayo, D. G. (2010). “Objectivity and Conditionality in Frequentist Inference.” In Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, edited by Deborah G. Mayo and Aris Spanos, 276–304. Cambridge: Cambridge University Press. • Cox, D. and Mayo, D. (2011). “A Statistical Scientist Meets a Philosopher of Science: A Conversation between Sir David Cox and Deborah Mayo”, in Rationality, Markets and Morals (RMM) 2, 103–14. • Fisher, R. A. (1935a). The Design of Experiments. Oxford: Oxford University Press. • Gelman, A. and Shalizi, C. (2013). Philosophy and the Practice of Bayesian Statistics and Rejoinder, British Journal of Mathematical and Statistical Psychology 66(1), 8–38; 76–80. • Giere, R. (1976). Empirical probability, objective statistical methods, and scientific inquiry. In Foundations of probability theory, statistical inference and statistical theories of science, vol. 2, edited by W. 1. Harper and C. A. Hooker, 63-101. Dordrecht, The Netherlands: D. Reidel. • Hacking, I. (1965). Logic of Statistical Inference. Cambridge: Cambridge University Press. • Hacking (1972). Likelihood. British Journal for the Philosophy of Science 23, l32-3 7. • Hacking, I. (1980). The theory of probable inference: Neyman, Peirce and Braithwaite. In Mellor, D. (ed.), Science, Belief and Behavior: Essays in Honour of R. B. Braithwaite, Cambridge: Cambridge University Press, pp. 141–60. 45
  • 47. • Harper, W. 1., and C. A. Hooker, eds. 1976. Foundations of probability theory, statistical inference and statistical theories of science. Vol. 2. Dordrecht, The Netherlands: D. Reidel. • Howson, C. & Urbach, P. (1993). Scientific Reasoning: The Bayesian Approach. LaSalle, IL: Open Court. • Kass, R. & Wasserman, L. (1996). The Selection of Prior Distributions by Formal Rules. Journal of the American Statistical Association 91, 1343–70. • Kempthorne, O. (1976). Statistics and the Philosophers, in Harper, W. and Hooker, C. (eds.), Foundations of Probability Theory, Statistical Inference and Statistical Theories of Science, Volume II. 273–314. Boston, MA: D. Reidel. • Lindley, D. V. (1971). The Estimation of Many Parameters in Godambe, V. and Sprott, D. (eds.), Foundations of Statistical Inference 435–455. Toronto: Holt, Rinehart and Winston. • Mayo, D. (1991). Novel Evidence and Severe Tests. Philosophy of Science 58(4), 523–52. • Mayo, D. (1996). Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press. • Mayo, D. (2014). On the Birnbaum Argument for the Strong Likelihood Principle (with discussion), Statistical Science 29(2), 227–39; 261–6. • Mayo, D. (2016). Don’t Throw Out the Error Control Baby with the Bad Statistics Bathwater: A Commentary on Wasserstein, R. L. and Lazar, N. A. 2016, “The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician 70(2) (supplemental materials). • Mayo, D. (2018). Statistical inference as severe testing: How to get beyond the statistics wars. Cambridge: Cambridge University Press. • Mayo, D. (forthcoming). The Statistics Wars and Intellectual Conflicts of Interest (editorial). Conservation Biology. 46
  • 48. • Mayo, D. & Cox, D. (2006). Frequentist statistics as a theory of inductive inference. In Rojo, J. (ed.), Optimality: The Second Erich L. Lehmann Symposium, Lecture Notes-Monograph series, Institute of Mathematical Statistics (IMS), 49, pp. 77–97. (Reprinted 2010 in Mayo, D. and Spanos, A. (eds.), pp. 247– 75.) • Mayo, D. & Hand, D. (under review). Statistical Significance Tests: Practicing damaging science, or damaging scientific practice? In Kao, M., Shech, E., & Mayo, D. Synthese (Special Issue: Recent Issues in Philosophy of Statistics: Evidence, Testing, and Applications ). • Mayo, D. & Spanos, A. (2006). Severe testing as a basic concept in a Neyman–Pearson philosophy of induction. British Journal for the Philosophy of Science 57(2), 323–57. • Mayo, D. G., and A. Spanos (2011). “Error Statistics.” In Philosophy of Statistics, edited by Prasanta S. Bandyopadhyay and Malcolm R. Forster, 7:152–198. Handbook of the Philosophy of Science. The Netherlands: Elsevier. • Musgrave, A. (1974). ‘Logical versus Historical Theories of Confirmation’, The British Journal for the Philosophy of Science 25(1), 1–23. • Neyman J. & Pearson, E. (1967). On the problem of the most efficient tests of statistical hypotheses. In Joint statistical papers, 140-85 (Berkeley: University of California Press). First published in Philosophical Transactions of the Royal Society (A)(1933):231, 289-337. • Popper, K. (1959). The Logic of Scientific Discovery. London, New York: Routledge. • Simmons, J., Nelson, L., & Simonsohn, U. (2012). A 21 word solution. Dialogue: The Official Newsletter of the Society for Personality and Social Psychology 26(2), 4–7. • Wasserstein, R. & Lazar, N. (2016). The ASA’s statement on p-values: Context, process and purpose (and supplemental materials). The American Statistician, 70(2), 129-133. • Wasserstein, R., Schirm, A,. & Lazar, N. (2019). Moving to a world beyond “p < 0.05” (Editorial). The American Statistician 73(S1), 1–19. https://doi.org/10.1080/00031305.2019.1583913 47