SlideShare une entreprise Scribd logo
1  sur  68
Simulating data to gain insights into
power and p-hacking
Dorothy V. M. Bishop
Professor of Developmental Neuropsychology
University of Oxford
@deevybee
Before you get started….
• The early exercises in this lesson use Microsoft Excel,
which most people will have installed
• The later exercises use R and R studio. This is free
software. If you don’t have it, you’ll need to download
it. As this can take time, it’s recommended that you do
that before you go further.
• Please follow instructions on the next slide.
Installing R
• Open an internet browser and go to www.r-project.org.
• Click the "download R" link in the middle of the page under
"Getting Started."
• Click on the link for a CRAN location close to you
• Mac users:
• Click on the "Download R for (Mac) OS X" link at the top of the page.
• Click on the file containing the latest version of R under "Files."
• Save the .pkg file, double-click it to open, and follow the installation
instructions.
• Windows users:
• Click on the "Download R for Windows" link at the top of the page.
• Click on the "install R for the first time" link at the top of the page.
• Click "Download R for Windows" and save the executable file
somewhere on your computer.
• Run the .exe file and follow the installation instructions.
Installing R studio
• R studio is a friendly interface for R. Once it is installed, you
need not open the original R software: instead, you access
R by opening the R studio application
• Go to www.rstudio.com and click on the "Download RStudio"
button.
• Click on "Download RStudio Desktop."
Mac users:
• Click on the version recommended for your system, or the latest
Mac version, save the .dmg file on your computer, double-click it
to open, and then drag and drop it to your applications folder.
Windows users:
• Click on the version recommended for your system, or the latest
Windows version, and save the executable file. Run the .exe file
and follow the installation instructions.
Why invent data?
• If you can anticipate what your data will look like, you
will also anticipate a lot of issues about study design
that you might not have thought of
• Analysing a simulated dataset can clarify what is
optimal analysis/ how the analysis works
• Simulating data with an anticipated effect is very
useful for power analysis – deciding what sample size
to use
• Simulating data with no effect (i.e. random noise) gives
unique insights into p-hacking
Ways to simulate data
• For newbies: to get the general idea: Excel
• Far better but involves steeper learning curve: R
• Also (but not covered here) options in SPSS and
Matlab:
• e.g. https://www.youtube.com/watch?v=XBmvYORP5EU
• http://uk.mathworks.com/help/matlab/random-number-
generation.html
Basic idea
• Anything you measure can be seen as a
combination of an effect of interest plus random
noise
• The goal of research is to find out
• (a) whether there is an effect of interest
• (b) if yes, how big it is
• Classic hypothesis-testing with p-values is simply
focuses just on (a) – i.e. have we just got noise or
a real effect?
• We can simulate most scenarios by generating
random noise, with or without a consistent added
effect
Basic idea: generate a set of random numbers in Excel
• Open a new workbook
• In cell A1 type random number
• In cell A2 type = rand()
Grab the little
square in the
bottom right of A2
and pull it down to
autofill the cells
below to A8
Random numbers in Excel, ctd
• You have just simulated
some data!
• Are your numbers the
same as mine?
• What happens when
you type rand() in
A9?
Random numbers in Excel, ctd.
• Your numbers will be different to mine – that’s because they
are random.
• The numbers will change whenever you open the worksheet,
or make any change to it.
• Sometimes that’s fine, but for this demo we want to keep
the same numbers. To control when random numbers
update, select Manual in Formula|Calculation Options.
• To update to new numbers use Calculate Now button.
Random numbers in Excel, ctd.
• The rand() function generates random numbers between 0 and 1:
Are these the kind of numbers
we want?
Realistic data usually involves normally distributed numbers
• Nifty way to do this in Excel: treat generated numbers as p-values
• The normsinv() function turns a p-value into a z-score
Z-score
Normally distributed random numbers
Try this:
• Type = normsinv(A2) in
cell B2
• Drag formula down to
cell B8
• Now look at how the
numbers in column A
relate to those in
column B.
NB. In practice, we can generate normally distributed random numbers
(i.e. z-scores) in just one step with formula: = normsinv(rand())
Now we are ready to simulate a study where we have
2 groups to be compared on a t-test
• Pull down the
formula from
columns A
and B to
extend to
A11:B11
• Type a header
‘group’ in C1
• Type 1 in
C2:C6 and 2
in C7:C11
What is formula for t-test in Excel?
Basic rule for life, especially in programming: if you don’t know it,
Google it
TTEST formula in xls:
You specify:
Range 1
Range 2
tails (1 or 2)
type
1 = paired
2 = unpaired equal variance
3 = unpaired unequal variance
Try entering the formula for the t-test in C12
=TTEST(B2:B6,
B7:B11,2,2)
What is the number
that you get?
This formula gives
you a p-value
Now press
‘calculate now’ 20
times, and keep a
tally of how many
p-values are < .05 in
20 simulations
• What has this shown you?
• P-values ‘dance about’ even when data are entirely
random
• On average, one in 20 runs will give p < .05 when null
hypothesis is true – no difference between groups
See Geoff Cumming: Dance of the p-values
https://www.youtube.com/watch?v=5OL1RqHrZQ8
Congratulations! You have done your first simulation
We’ll stick with Excel for one more simulation
• So far, we’ve simulated the null hypothesis - random
data. If we find a ‘significant’ difference, we know it’s a
false positive
• Next, we’ll simulate data with a genuine effect.
• It’s easy to do this: we just add a constant to all the
values for group 2
• Since we’re using z-scores, the constant will correspond
to the effect size (expressed as Cohen’s d).
• Let’s try an effect size of .5
• For cells B7, change the formula to = normsinv(A7)+.5
• Drag the formula down to cell B11 and hit ‘Calculate
now’
I’ve added formulae to
show the mean and SD for
the two groups:
= AVERAGE(B2:B6)
= STDEV(B2:B6)
= AVERAGE(B7:B11)
= STDEV(B7:B11)
Your values will differ.
Why isn’t the difference in
means for the two groups
exactly .5?
I’ve added formulae to
show the mean and SD for
the two groups:
= AVERAGE(B2:B6)
= STDEV(B2:B6)
= AVERAGE(B7:B11)
= STDEV(B7:B11)
Your values will differ.
Why isn’t the difference in
means for the two groups
exactly .5?
ANSWER: mean/SD
describe the population;
this is just a sample from
that population
Now add the formula
for the t-test
Is p < .05 ?
It’s pretty unlikely
you will see a
significant result.
Why?
Now add the formula
for the t-test
Is p < .05 ?
It’s pretty unlikely
you will see a
significant result.
Why?
ANSWER: Sample too
small – can’t pick out
signal from noise
• The first simulation gave some insights into false positive
rates: it shows how you can get a ‘significant’ result from
random data
• The second simulation illustrates the opposite situation:
showing how often you can fail to get a significant p-value,
even when there is a true effect (false negative)
• This brings us on to the topic of statistical power: the
probability of detecting a real effect with a given sample size
• To build on these insights we need to do lots of simulations,
and for that it’s best to move to R (which hopefully you have
already installed: if not see slides 2-3)
What have we learned so far?
Benefits of simulating data in R
• Can write a script that executes commands to generate data
and then run it automatically many times and store results
• Much faster than Excel, and reproducible
• Can generate different distributions, correlated variables, etc.
• Powerful plotting functions
• A good way of starting to learn R
Downside: Steep initial learning curve
But remember: Google is your friend
Tons of material about R on the internet
Ready? Create a folder to save your work and fire up R studio!
Self-teaching scripts on https://osf.io/skz3j/
Download, save and open this one: Simulation_ex1_intro.R
Source pane: script Console: try commands out here Environment:
check variables here
First thing to do: Set working directory
• Working directory is where R will default to when reading and
writing stuff
• Easiest way to set it: Go to Session|Set working directory
Note that when you do this, the command to set working directory will pop up on the
console. On my computer I see:
setwd("~/deevybee_repo")
Now we’ll go through the script: it will generate same
type of 2-group data as we’ve done in the 2nd
exercise in Excel
Preliminaries: Install packages. Use Tools|Install Packages
• Remember! A common reason for R code not to work is because you have not
installed a package that you need.
• After installing the package you have to use the library or require
command in your script to load it for this session.
To run the code in lines 41-49…
• Select the lines of code
• Click on the Run button in the top bar
• Check what happens in the console
Running a script line by line is a good way to learn R
Now start simulating data!
• rnorm is an inbuilt R function that generates
random normal deviates
Now run lines 56-68
Now start simulating data!
• rnorm is an inbuilt R function that generates
random normal deviates
• Note that as well as results you specify being
shown on the console, any variables you create are
now featured in the environment pane
Now run lines 56-68
Think about questions on lines 72-74
• If you’re confused, remember what you’ve been
taught in basic statistics (I hope!) about the
differences between a population and a sample.
• The mean/SD we specify determines
characteristics of the population from which we
are sampling.
See also:
http://deevybee.blogspot.com/2017/12/using-
simulations-to-understand.html
Now we’ll run lines 79-91 to generate data for
another group with different mean
• If our scores are z-scores and the mean for group 1 is zero, then myM2
corresponds to Cohen’s d measure of effect size.
• The final command creates interesting output on the console: results of a
Welch 2-sample t-test (i.e. t-test with correction for unequal variances)
Advantages of R over Excel
• Can easily regenerate the data from the
script
• Very easy to change one parameter and
generate a new dataset
• We will see shortly how to repeatedly run a
simulation and store results by using a loop
• But first we’ll do some data reformatting
and show a neat way of plotting the results
Making a data frame
• A data frame as a way of storing the data that is rather like an Excel worksheet
• You can store observations in rows and variables in columns
• Data frames are versatile and can hold different variable types
• We’ll put our newly created vectors into a data frame, mydf, with columns for
group and score
• We can easily view mydf by clicking on mydf in the Environment tab
Filling the data frame
You can refer to a specific cell in a data frame with the row and column index
e.g. mydf[3, 2] refers to 3rd row and 2nd column. Note square brackets here
You can refer to a whole column by using $ and its name, e.g.mydf$Group
You can also refer to a specific row of a named column, e.g. mydf$Group[3]
Run lines 117-125
Deconstructing the t-test result
• One reason for making a data frame is that there are many functions in R that
operate on data frames.
• One of these is the pirateplot function from the yarrr package. This creates a
nice kind of plot called a pirate plot, which shows the distribution of individual
data points as well as other summary statistics. We want to make a pirate plot
with a header that shows the t-test result
• Run line 131: myt <- t.test(myvectorA,myvectorB)
The comments explain this more, but basically you can extract bits of the output
in myt using $. If you type in the console:
myt$
A menu pops up showing you which parameters there are.
Now run lines 145-149, which show how you can bolt together bits of output from
the t-test to make a useful header for a plot
Make a pirate plot
• Run line 151:
• pirateplot(Score~Group,data=mydf,main
=myheader, xlab="Group",
ylab="Score")
Your plot will be different from this because we are
generating random numbers that vary on each run.
The pirate plot is not a well-known type of graphic ;
this is a perfect opportunity to practice Googling to learn
more about it – you should try varying the script to see
how you can affect the graph
Some general points to help you learn R
1. Basic rule for life, especially in programming: if you don’t know
it, Google it
In R, Google your error message
2. Best way to learn is by making mistakes
If you see a line of code you don’t understand, play with it to find
out what it does.
Look at Environment tab, or type name of variable on the console
to check its value
E.g., you want repeating numbers? Type in the console to
compare: rep (1,3) and rep (3,1)
Pause to play with the script.
Make a note of any questions
Simulation_ex1_multioutput.R
This is essentially the same as the previous script, except that:
• The plots are sent to a pdf rather than being output on the Plots pane (see
comments in the script for explanation)
• You run the simulation repeatedly, with two different values for N
The structure of the script is with 2 nested loops:
for (i in 1:2){ #line 15
……… #various commands here
for (j in 1:10){ #line 21
……… #various commands here
}
}
• The first loop runs twice; the second loop, which is nested inside it, runs 10 times.
So overall there are 20 runs
• The value,i,in the first loop, controls sample size which is either myNs[1] or
myNs[2]
• The value, j, in the second loop just acts as a counter, to ensure that there are 10
repetitions
Run the whole script!
Click on the Files tab in the bottom right-hand pane, and
you’ll see you have created two new pdf files (you may
need to scroll down to see them):
Look at these files, paying particular attention to the proportion
of runs where p < .05.
10 runs of simulation with N = 20 per group and effect size (d) = .3
** *
*
10 runs of simulation with N = 100 per group and effect size (d) = .3
**
* * **
* *
Points to note
• Smaller samples associated with more variable results.
• With small sample sizes, true but weak effects will usually
not give you a ‘significant’ result (i.e. p < .05).
• In the example here, with effect size of .3, sample of 100
per group only gives a significant result on around 60% of
runs.
• This is the same as saying the power of the study to
detect an effect size of .3 is equal to .60%
• Many statisticians recommend power should be 80% or
more (though will depend on purpose of study).
Body of table show sample size per group
Jacob Cohen worked this all out in 1988
Estimating statistical power for your study
For simple designs can use G-power package (or Cohen’s
formulae)
For more complex designs, simulation is a better approach, -
just run the analysis on simulated data 10,000 times and
then see how frequently your result is ‘significant’ by
whatever criterion you plan to use.
This requires you to have a sense of what your data will look
like, and you have to have an estimate of what is the
smallest effect size that you’d be interested in.
“Small studies continue to be carried out
with little more than a blind hope of
showing the desired effect. Nevertheless,
papers based on such work are submitted
for publication, especially if the results
turn out to be statistically significant.”
Weak statistical power has been, and continues to be a
major cause of problems with replication of findings
1987
Newcombe
Part 2: Simulating null results to illustrate p-hacking
P-hacking and type 1 error (false positives)
Load simulation_ex2_correlations.R
Often studies have multiple variables of interest.
This script shows you how to use the mvrnorm function
from the MASS package to simulate multivariate normal data
It also demonstrates the dangers of p-hacking
First just ensure the necessary packages are installed and
load them using library(): run lines 11-14
Introduction to mvrnorm
In R, if you want to know how to use a function, you can just type help,
e.g. help(mvrnorm)
But often the official help information is technical and unfriendly and you may find
more useful and accessible information and examples by Googling.
The essential arguments for mvrnorm are the sample size (n), mu, which is a vector
of means (one per variable), and Sigma, a matrix showing the correlations between
variables. We’ll ignore the other arguments provided on the Help page for this demo.
To make life easy, we will again create z-scores for our data, so mean will be zero and
SD = 1.
We can set nvar to 7 and then specify mu = rep(0, nvar).
You could just have mu = rep (0,7)
or even mu = c(0,0,0,0,0,0,0)
But a good script avoids hardcoding variables like this: you want to be able to try
running the script with a range of values, and it’s much easier just changing the initial
definition of nvar than retyping all the lines of code that use nvar.
Specifying covariance between variables
You should be familiar with the correlation coefficient, r
If we are using z-score and have r = .5, what is covariance?
Creating the covariance matrix
• One benefit of using z-scores is that the covariance matrix is the same as the
correlation matrix, so if we specify the amount of correlation between variables,
then we can easily make the covariance matrix that we need.
• For simplicity, we’ll just assume that all our simulated variables are
intercorrelated by the same amount, a value we’ll call myCorr.
• So, if we have 7 variables, and myCorr is 0, we will need a matrix like this:
• The script achieves this just by making a matrix where all values = myCorr, and
then overwriting the diagonal with myVar (which we’ve set to 1)
• N.B. The script is set to simulate uncorrelated variables, so off-diagonal values are
0, but you could experiment with other values, by changing myCorr
Running mvrnorm
Before starting, it’s a good idea to clear all variables: R does not do that
automatically, and it can be problematic if you still have values of variables from an
earlier session. To clear them all, click the little broom symbol on the Environment
tab.
Now run all the lines of the script up to and including line 51.
Check the Environment tab, which will show all the variables you have created.
Skip over the command on line 60 for the moment.
That line starts a loop and if you try to run it, the system will hang waiting for a close
curly bracket to match the open curly bracket. (You can get out of that by either
hitting escape or typing a close curly bracket on the console).
For now, just run line 69
mydata<-mvrnorm(n=myN, mu=rep(myM,nVar), Sigma=myCov)
As with Excel simulation, the script generate fresh set of numbers
on each run, though we can modify the settings to override this.
(Google ‘setting seed’ in R)
First six rows of mydata look like this:
Now we can analyse the simulated data!
Let’s look at correlations between the seven variables
Pick your favourite variables by selecting two numbers
between 1 and 7
Thought experiment: We’ve simulated uncorrelated variables.
In a single run, how likely is it that we’ll see:
• No significant correlations
• Some significant correlations
• A significant correlation (p < .05) between your favourite
variables
Correlation matrix for run 1
Output from simulation of 7 independent variables, where true correlation = 0
N = 30
Red denotes p < .05 ( r > .31 or < -.31);
Correlation matrix for run 2
Output from simulation of 7 independent variables, where true correlation = 0
N = 30
Red denotes p < .05 ( r > .31 or < -.31);
Correlation matrix for run 3
Output from simulation of 7 independent variables, where true correlation = 0
N = 30
Red denotes p < .05 ( r > .31 or < -.31);
There is no relation between variables – why do we have
significant values?
Correlation matrix for run 4
Output from simulation of 7 independent variables, where true correlation = 0
N = 30
Red denotes p < .05 ( r > .31 or < -.31);
On any one run, we are looking at 21 correlations.
So we should use Bonferroni corrected p-value: .05/21 = .002,
corresponds to r = .51
Now try to work through the script yourself
• You can run the script to generate your own table of results (it is
set up just to show the table for the final run).
• The bit of the script for generating tables showing significant p-
values in colour is complex: don’t worry if you don’t understand
it.
• Most important thing is that you should develop competence to
play around with the script and see how the output changes
depending on how you change the sample size, the number of
variables, and the true correlation between variables.
• Use of .05 cutoff makes sense only in relation to an a-priori
hypothesis
Many ways in which ‘hidden multiplicity’ of testing can give false
positive (p < .05) results
• Data dredging from a large set of variables
• Multi-way Anova with many main effects/interactions
• Cramer, A. O. J., et al (2016). Hidden multiplicity in exploratory multiway ANOVA:
Prevalence and remedies. Psychonomic Bulletin & Review, 23(2), 640-647.
doi:10.3758/s13423-015-0913-5)
• Trying various analytic approaches until one ‘works’
• Post-hoc division of data into subgroups
In latter 2 instances, may be hard to estimate appropriate
correction – many binary choices -> multiplicative effects
Key point: p-values can only be interpreted in terms of the context
in which they are computed
1 contrast
Probability of a
‘significant’ p-value
< .05 = .05
Large population
database used to explore
link between ADHD and
handedness
https://figshare.com/articles/The_Garden_of_Forking_Paths/2100379
Demonstration of rapid expansion of comparisons with binary divisions
Focus just on Young
subgroup:
2 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .10
Large population
database used to explore
link between ADHD and
handedness
Focus just on Young on
measure of hand skill:
4 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .19
Large population
database used to explore
link between ADHD and
handedness
Focus just on Young,
Females on
measure of hand skill:
8 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .34
Large population
database used to explore
link between ADHD and
handedness
Focus just on Young,
Urban, Females on
measure of hand skill:
16 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .56
Large population
database used to explore
link between ADHD and
handedness
1956
De Groot
Failure to distinguish between
hypothesis-testing and
hypothesis-generating
(exploratory) research
-> misuse of statistical tests
de Groot, A. D. (2014). The meaning of “significance” for
different types of research [translated and annotated by Eric-
Jan Wagenmakers, et al]. Acta Psychologica, 148, 188-194.
doi:http://dx.doi.org/10.1016/j.actpsy.2014.02.001
Further reading
R scripts available on : https://osf.io/view/reproducibility2017/
• Simulation_ex1_intro.R
Suitable for R newbies. Demonstrates ‘dance of the p-values’ in a t-test.
Bonus, you learn to make pirate plots
• Simulation_ex2_correlations
Generate correlation matrices from multivariate normal distribution.
Bonus, you learn to use ‘grid’ to make nicely formatted tabular outputs.
• Simulation_ex3_multiwayAnova.R
Simulate data for a 3-way mixed ANOVA. Demonstrates need to correct
for N factors and interactions when doing exploratory multiway Anova.
• Simulation_ex4_multipleReg.R
Simulate data for multiple regression.
• Simulation_ex5_falsediscovery.R
Simulate data for mixture of null and true effects, to demonstrate that
the probability of the data given the hypothesis is different from the
probability of the hypothesis given the data.
Two simulations from Daniel Lakens’ Coursera Course – with notes!
• 1.1 WhichPvaluesCanYouExpect.R
• 3.2 OptionalStoppingSim.R
Now even
more: See
OSF!

Contenu connexe

Tendances

ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
Shajun Nisha
 

Tendances (20)

Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
 
Scale Invariant feature transform
Scale Invariant feature transformScale Invariant feature transform
Scale Invariant feature transform
 
APG Pertemuan 5 : Inferensia Vektor Rata-rata 1 Populasi
APG Pertemuan 5 : Inferensia Vektor Rata-rata 1 PopulasiAPG Pertemuan 5 : Inferensia Vektor Rata-rata 1 Populasi
APG Pertemuan 5 : Inferensia Vektor Rata-rata 1 Populasi
 
Digital Image Fundamentals - II
Digital Image Fundamentals - IIDigital Image Fundamentals - II
Digital Image Fundamentals - II
 
Image restoration and reconstruction
Image restoration and reconstructionImage restoration and reconstruction
Image restoration and reconstruction
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
Data discretization
Data discretizationData discretization
Data discretization
 
Ch08 ci estimation
Ch08 ci estimationCh08 ci estimation
Ch08 ci estimation
 
MLPI Lecture 2: Monte Carlo Methods (Basics)
MLPI Lecture 2: Monte Carlo Methods (Basics)MLPI Lecture 2: Monte Carlo Methods (Basics)
MLPI Lecture 2: Monte Carlo Methods (Basics)
 
Computer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and Gradient
 
Developing R Graphical User Interfaces
Developing R Graphical User InterfacesDeveloping R Graphical User Interfaces
Developing R Graphical User Interfaces
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++
 
Python基本資料運算
Python基本資料運算Python基本資料運算
Python基本資料運算
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
Scaling and shearing
Scaling and shearingScaling and shearing
Scaling and shearing
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
Chapter 5
Chapter 5Chapter 5
Chapter 5
 
Digital image processing
Digital image processingDigital image processing
Digital image processing
 
Color image processing
Color image processingColor image processing
Color image processing
 
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
 

Similaire à Simulating data to gain insights into power and p-hacking

Program 1 – CS 344This assignment asks you to write a bash.docx
Program 1 – CS 344This assignment asks you to write a bash.docxProgram 1 – CS 344This assignment asks you to write a bash.docx
Program 1 – CS 344This assignment asks you to write a bash.docx
wkyra78
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
butest
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
butest
 
1 Computer Assignment 3 --- Hypothesis tests about m.docx
1 Computer Assignment 3     ---   Hypothesis tests about m.docx1 Computer Assignment 3     ---   Hypothesis tests about m.docx
1 Computer Assignment 3 --- Hypothesis tests about m.docx
mercysuttle
 

Similaire à Simulating data to gain insights into power and p-hacking (20)

Data simulation basics
Data simulation basicsData simulation basics
Data simulation basics
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
CC-112-Lec.1.ppsx
CC-112-Lec.1.ppsxCC-112-Lec.1.ppsx
CC-112-Lec.1.ppsx
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer vision
 
Program 1 – CS 344This assignment asks you to write a bash.docx
Program 1 – CS 344This assignment asks you to write a bash.docxProgram 1 – CS 344This assignment asks you to write a bash.docx
Program 1 – CS 344This assignment asks you to write a bash.docx
 
Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
 
Comp 122 lab 6 lab report and source code
Comp 122 lab 6 lab report and source codeComp 122 lab 6 lab report and source code
Comp 122 lab 6 lab report and source code
 
Software Design
Software DesignSoftware Design
Software Design
 
Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4
 
Machine learning
Machine learningMachine learning
Machine learning
 
Building an AI and ML Model Using KNIME and Python.pptx
Building an AI and ML Model Using KNIME and Python.pptxBuilding an AI and ML Model Using KNIME and Python.pptx
Building an AI and ML Model Using KNIME and Python.pptx
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
1 Computer Assignment 3 --- Hypothesis tests about m.docx
1 Computer Assignment 3     ---   Hypothesis tests about m.docx1 Computer Assignment 3     ---   Hypothesis tests about m.docx
1 Computer Assignment 3 --- Hypothesis tests about m.docx
 
Borderline Smote
Borderline SmoteBorderline Smote
Borderline Smote
 
CPP10 - Debugging
CPP10 - DebuggingCPP10 - Debugging
CPP10 - Debugging
 
CS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of AlgorithmsCS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of Algorithms
 

Plus de Dorothy Bishop

Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.
Dorothy Bishop
 

Plus de Dorothy Bishop (20)

Exercise/fish oil intervention for dyslexia
Exercise/fish oil intervention for dyslexiaExercise/fish oil intervention for dyslexia
Exercise/fish oil intervention for dyslexia
 
Open Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill PandemicOpen Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill Pandemic
 
Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.
 
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save lives
 
Short talk on 2 cognitive biases and reproducibility
Short talk on 2 cognitive biases and reproducibilityShort talk on 2 cognitive biases and reproducibility
Short talk on 2 cognitive biases and reproducibility
 
Otitis media with effusion: an illustration of ascertainment bias
Otitis media with effusion: an illustration of ascertainment biasOtitis media with effusion: an illustration of ascertainment bias
Otitis media with effusion: an illustration of ascertainment bias
 
Insights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibilityInsights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibility
 
What are metrics good for? Reflections on REF and TEF
What are metrics good for? Reflections on REF and TEFWhat are metrics good for? Reflections on REF and TEF
What are metrics good for? Reflections on REF and TEF
 
Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
 
What is Developmental Language Disorder
What is Developmental Language DisorderWhat is Developmental Language Disorder
What is Developmental Language Disorder
 
Developmental language disorder and auditory processing disorder: 
Same or di...
Developmental language disorder and auditory processing disorder: 
Same or di...Developmental language disorder and auditory processing disorder: 
Same or di...
Developmental language disorder and auditory processing disorder: 
Same or di...
 
Fallibility in science: Responsible ways to handle mistakes
Fallibility in science: Responsible ways to handle mistakesFallibility in science: Responsible ways to handle mistakes
Fallibility in science: Responsible ways to handle mistakes
 
Improve your study with pre-registration
Improve your study with pre-registrationImprove your study with pre-registration
Improve your study with pre-registration
 
Southampton: lecture on TEF
Southampton: lecture on TEFSouthampton: lecture on TEF
Southampton: lecture on TEF
 
Reading list: What’s wrong with our universities
Reading list: What’s wrong with our universitiesReading list: What’s wrong with our universities
Reading list: What’s wrong with our universities
 
IJLCD Winter Lecture 2016-7 : References
IJLCD Winter Lecture 2016-7 : ReferencesIJLCD Winter Lecture 2016-7 : References
IJLCD Winter Lecture 2016-7 : References
 
What's wrong with our Universities, and will the Teaching Excellence Framewor...
What's wrong with our Universities, and will the Teaching Excellence Framewor...What's wrong with our Universities, and will the Teaching Excellence Framewor...
What's wrong with our Universities, and will the Teaching Excellence Framewor...
 
Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Bishop reproducibility references nov2016
Bishop reproducibility references nov2016
 
On the importance of WIPS not being wimps
On the importance of WIPS not being wimpsOn the importance of WIPS not being wimps
On the importance of WIPS not being wimps
 

Dernier

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 

Dernier (20)

VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 

Simulating data to gain insights into power and p-hacking

  • 1. Simulating data to gain insights into power and p-hacking Dorothy V. M. Bishop Professor of Developmental Neuropsychology University of Oxford @deevybee
  • 2. Before you get started…. • The early exercises in this lesson use Microsoft Excel, which most people will have installed • The later exercises use R and R studio. This is free software. If you don’t have it, you’ll need to download it. As this can take time, it’s recommended that you do that before you go further. • Please follow instructions on the next slide.
  • 3. Installing R • Open an internet browser and go to www.r-project.org. • Click the "download R" link in the middle of the page under "Getting Started." • Click on the link for a CRAN location close to you • Mac users: • Click on the "Download R for (Mac) OS X" link at the top of the page. • Click on the file containing the latest version of R under "Files." • Save the .pkg file, double-click it to open, and follow the installation instructions. • Windows users: • Click on the "Download R for Windows" link at the top of the page. • Click on the "install R for the first time" link at the top of the page. • Click "Download R for Windows" and save the executable file somewhere on your computer. • Run the .exe file and follow the installation instructions.
  • 4. Installing R studio • R studio is a friendly interface for R. Once it is installed, you need not open the original R software: instead, you access R by opening the R studio application • Go to www.rstudio.com and click on the "Download RStudio" button. • Click on "Download RStudio Desktop." Mac users: • Click on the version recommended for your system, or the latest Mac version, save the .dmg file on your computer, double-click it to open, and then drag and drop it to your applications folder. Windows users: • Click on the version recommended for your system, or the latest Windows version, and save the executable file. Run the .exe file and follow the installation instructions.
  • 5. Why invent data? • If you can anticipate what your data will look like, you will also anticipate a lot of issues about study design that you might not have thought of • Analysing a simulated dataset can clarify what is optimal analysis/ how the analysis works • Simulating data with an anticipated effect is very useful for power analysis – deciding what sample size to use • Simulating data with no effect (i.e. random noise) gives unique insights into p-hacking
  • 6. Ways to simulate data • For newbies: to get the general idea: Excel • Far better but involves steeper learning curve: R • Also (but not covered here) options in SPSS and Matlab: • e.g. https://www.youtube.com/watch?v=XBmvYORP5EU • http://uk.mathworks.com/help/matlab/random-number- generation.html
  • 7. Basic idea • Anything you measure can be seen as a combination of an effect of interest plus random noise • The goal of research is to find out • (a) whether there is an effect of interest • (b) if yes, how big it is • Classic hypothesis-testing with p-values is simply focuses just on (a) – i.e. have we just got noise or a real effect? • We can simulate most scenarios by generating random noise, with or without a consistent added effect
  • 8. Basic idea: generate a set of random numbers in Excel • Open a new workbook • In cell A1 type random number • In cell A2 type = rand() Grab the little square in the bottom right of A2 and pull it down to autofill the cells below to A8
  • 9. Random numbers in Excel, ctd • You have just simulated some data! • Are your numbers the same as mine? • What happens when you type rand() in A9?
  • 10. Random numbers in Excel, ctd. • Your numbers will be different to mine – that’s because they are random. • The numbers will change whenever you open the worksheet, or make any change to it. • Sometimes that’s fine, but for this demo we want to keep the same numbers. To control when random numbers update, select Manual in Formula|Calculation Options. • To update to new numbers use Calculate Now button.
  • 11. Random numbers in Excel, ctd. • The rand() function generates random numbers between 0 and 1: Are these the kind of numbers we want?
  • 12. Realistic data usually involves normally distributed numbers • Nifty way to do this in Excel: treat generated numbers as p-values • The normsinv() function turns a p-value into a z-score Z-score
  • 13. Normally distributed random numbers Try this: • Type = normsinv(A2) in cell B2 • Drag formula down to cell B8 • Now look at how the numbers in column A relate to those in column B. NB. In practice, we can generate normally distributed random numbers (i.e. z-scores) in just one step with formula: = normsinv(rand())
  • 14. Now we are ready to simulate a study where we have 2 groups to be compared on a t-test • Pull down the formula from columns A and B to extend to A11:B11 • Type a header ‘group’ in C1 • Type 1 in C2:C6 and 2 in C7:C11
  • 15. What is formula for t-test in Excel? Basic rule for life, especially in programming: if you don’t know it, Google it TTEST formula in xls: You specify: Range 1 Range 2 tails (1 or 2) type 1 = paired 2 = unpaired equal variance 3 = unpaired unequal variance
  • 16. Try entering the formula for the t-test in C12 =TTEST(B2:B6, B7:B11,2,2) What is the number that you get? This formula gives you a p-value Now press ‘calculate now’ 20 times, and keep a tally of how many p-values are < .05 in 20 simulations
  • 17. • What has this shown you? • P-values ‘dance about’ even when data are entirely random • On average, one in 20 runs will give p < .05 when null hypothesis is true – no difference between groups See Geoff Cumming: Dance of the p-values https://www.youtube.com/watch?v=5OL1RqHrZQ8 Congratulations! You have done your first simulation
  • 18. We’ll stick with Excel for one more simulation • So far, we’ve simulated the null hypothesis - random data. If we find a ‘significant’ difference, we know it’s a false positive • Next, we’ll simulate data with a genuine effect. • It’s easy to do this: we just add a constant to all the values for group 2 • Since we’re using z-scores, the constant will correspond to the effect size (expressed as Cohen’s d). • Let’s try an effect size of .5 • For cells B7, change the formula to = normsinv(A7)+.5 • Drag the formula down to cell B11 and hit ‘Calculate now’
  • 19. I’ve added formulae to show the mean and SD for the two groups: = AVERAGE(B2:B6) = STDEV(B2:B6) = AVERAGE(B7:B11) = STDEV(B7:B11) Your values will differ. Why isn’t the difference in means for the two groups exactly .5?
  • 20. I’ve added formulae to show the mean and SD for the two groups: = AVERAGE(B2:B6) = STDEV(B2:B6) = AVERAGE(B7:B11) = STDEV(B7:B11) Your values will differ. Why isn’t the difference in means for the two groups exactly .5? ANSWER: mean/SD describe the population; this is just a sample from that population
  • 21. Now add the formula for the t-test Is p < .05 ? It’s pretty unlikely you will see a significant result. Why?
  • 22. Now add the formula for the t-test Is p < .05 ? It’s pretty unlikely you will see a significant result. Why? ANSWER: Sample too small – can’t pick out signal from noise
  • 23. • The first simulation gave some insights into false positive rates: it shows how you can get a ‘significant’ result from random data • The second simulation illustrates the opposite situation: showing how often you can fail to get a significant p-value, even when there is a true effect (false negative) • This brings us on to the topic of statistical power: the probability of detecting a real effect with a given sample size • To build on these insights we need to do lots of simulations, and for that it’s best to move to R (which hopefully you have already installed: if not see slides 2-3) What have we learned so far?
  • 24. Benefits of simulating data in R • Can write a script that executes commands to generate data and then run it automatically many times and store results • Much faster than Excel, and reproducible • Can generate different distributions, correlated variables, etc. • Powerful plotting functions • A good way of starting to learn R Downside: Steep initial learning curve But remember: Google is your friend Tons of material about R on the internet Ready? Create a folder to save your work and fire up R studio!
  • 25. Self-teaching scripts on https://osf.io/skz3j/ Download, save and open this one: Simulation_ex1_intro.R Source pane: script Console: try commands out here Environment: check variables here
  • 26. First thing to do: Set working directory • Working directory is where R will default to when reading and writing stuff • Easiest way to set it: Go to Session|Set working directory Note that when you do this, the command to set working directory will pop up on the console. On my computer I see: setwd("~/deevybee_repo")
  • 27. Now we’ll go through the script: it will generate same type of 2-group data as we’ve done in the 2nd exercise in Excel Preliminaries: Install packages. Use Tools|Install Packages • Remember! A common reason for R code not to work is because you have not installed a package that you need. • After installing the package you have to use the library or require command in your script to load it for this session.
  • 28. To run the code in lines 41-49… • Select the lines of code • Click on the Run button in the top bar • Check what happens in the console Running a script line by line is a good way to learn R
  • 29. Now start simulating data! • rnorm is an inbuilt R function that generates random normal deviates Now run lines 56-68
  • 30. Now start simulating data! • rnorm is an inbuilt R function that generates random normal deviates • Note that as well as results you specify being shown on the console, any variables you create are now featured in the environment pane Now run lines 56-68
  • 31. Think about questions on lines 72-74 • If you’re confused, remember what you’ve been taught in basic statistics (I hope!) about the differences between a population and a sample. • The mean/SD we specify determines characteristics of the population from which we are sampling. See also: http://deevybee.blogspot.com/2017/12/using- simulations-to-understand.html
  • 32. Now we’ll run lines 79-91 to generate data for another group with different mean • If our scores are z-scores and the mean for group 1 is zero, then myM2 corresponds to Cohen’s d measure of effect size. • The final command creates interesting output on the console: results of a Welch 2-sample t-test (i.e. t-test with correction for unequal variances)
  • 33. Advantages of R over Excel • Can easily regenerate the data from the script • Very easy to change one parameter and generate a new dataset • We will see shortly how to repeatedly run a simulation and store results by using a loop • But first we’ll do some data reformatting and show a neat way of plotting the results
  • 34. Making a data frame • A data frame as a way of storing the data that is rather like an Excel worksheet • You can store observations in rows and variables in columns • Data frames are versatile and can hold different variable types • We’ll put our newly created vectors into a data frame, mydf, with columns for group and score • We can easily view mydf by clicking on mydf in the Environment tab
  • 35. Filling the data frame You can refer to a specific cell in a data frame with the row and column index e.g. mydf[3, 2] refers to 3rd row and 2nd column. Note square brackets here You can refer to a whole column by using $ and its name, e.g.mydf$Group You can also refer to a specific row of a named column, e.g. mydf$Group[3] Run lines 117-125
  • 36. Deconstructing the t-test result • One reason for making a data frame is that there are many functions in R that operate on data frames. • One of these is the pirateplot function from the yarrr package. This creates a nice kind of plot called a pirate plot, which shows the distribution of individual data points as well as other summary statistics. We want to make a pirate plot with a header that shows the t-test result • Run line 131: myt <- t.test(myvectorA,myvectorB) The comments explain this more, but basically you can extract bits of the output in myt using $. If you type in the console: myt$ A menu pops up showing you which parameters there are. Now run lines 145-149, which show how you can bolt together bits of output from the t-test to make a useful header for a plot
  • 37. Make a pirate plot • Run line 151: • pirateplot(Score~Group,data=mydf,main =myheader, xlab="Group", ylab="Score") Your plot will be different from this because we are generating random numbers that vary on each run. The pirate plot is not a well-known type of graphic ; this is a perfect opportunity to practice Googling to learn more about it – you should try varying the script to see how you can affect the graph
  • 38. Some general points to help you learn R 1. Basic rule for life, especially in programming: if you don’t know it, Google it In R, Google your error message 2. Best way to learn is by making mistakes If you see a line of code you don’t understand, play with it to find out what it does. Look at Environment tab, or type name of variable on the console to check its value E.g., you want repeating numbers? Type in the console to compare: rep (1,3) and rep (3,1)
  • 39. Pause to play with the script. Make a note of any questions
  • 40. Simulation_ex1_multioutput.R This is essentially the same as the previous script, except that: • The plots are sent to a pdf rather than being output on the Plots pane (see comments in the script for explanation) • You run the simulation repeatedly, with two different values for N The structure of the script is with 2 nested loops: for (i in 1:2){ #line 15 ……… #various commands here for (j in 1:10){ #line 21 ……… #various commands here } } • The first loop runs twice; the second loop, which is nested inside it, runs 10 times. So overall there are 20 runs • The value,i,in the first loop, controls sample size which is either myNs[1] or myNs[2] • The value, j, in the second loop just acts as a counter, to ensure that there are 10 repetitions
  • 41. Run the whole script! Click on the Files tab in the bottom right-hand pane, and you’ll see you have created two new pdf files (you may need to scroll down to see them): Look at these files, paying particular attention to the proportion of runs where p < .05.
  • 42. 10 runs of simulation with N = 20 per group and effect size (d) = .3 ** * *
  • 43. 10 runs of simulation with N = 100 per group and effect size (d) = .3 ** * * ** * *
  • 44. Points to note • Smaller samples associated with more variable results. • With small sample sizes, true but weak effects will usually not give you a ‘significant’ result (i.e. p < .05). • In the example here, with effect size of .3, sample of 100 per group only gives a significant result on around 60% of runs. • This is the same as saying the power of the study to detect an effect size of .3 is equal to .60% • Many statisticians recommend power should be 80% or more (though will depend on purpose of study).
  • 45. Body of table show sample size per group Jacob Cohen worked this all out in 1988
  • 46. Estimating statistical power for your study For simple designs can use G-power package (or Cohen’s formulae) For more complex designs, simulation is a better approach, - just run the analysis on simulated data 10,000 times and then see how frequently your result is ‘significant’ by whatever criterion you plan to use. This requires you to have a sense of what your data will look like, and you have to have an estimate of what is the smallest effect size that you’d be interested in.
  • 47. “Small studies continue to be carried out with little more than a blind hope of showing the desired effect. Nevertheless, papers based on such work are submitted for publication, especially if the results turn out to be statistically significant.” Weak statistical power has been, and continues to be a major cause of problems with replication of findings 1987 Newcombe
  • 48. Part 2: Simulating null results to illustrate p-hacking
  • 49. P-hacking and type 1 error (false positives) Load simulation_ex2_correlations.R Often studies have multiple variables of interest. This script shows you how to use the mvrnorm function from the MASS package to simulate multivariate normal data It also demonstrates the dangers of p-hacking First just ensure the necessary packages are installed and load them using library(): run lines 11-14
  • 50. Introduction to mvrnorm In R, if you want to know how to use a function, you can just type help, e.g. help(mvrnorm) But often the official help information is technical and unfriendly and you may find more useful and accessible information and examples by Googling. The essential arguments for mvrnorm are the sample size (n), mu, which is a vector of means (one per variable), and Sigma, a matrix showing the correlations between variables. We’ll ignore the other arguments provided on the Help page for this demo. To make life easy, we will again create z-scores for our data, so mean will be zero and SD = 1. We can set nvar to 7 and then specify mu = rep(0, nvar). You could just have mu = rep (0,7) or even mu = c(0,0,0,0,0,0,0) But a good script avoids hardcoding variables like this: you want to be able to try running the script with a range of values, and it’s much easier just changing the initial definition of nvar than retyping all the lines of code that use nvar.
  • 51. Specifying covariance between variables You should be familiar with the correlation coefficient, r If we are using z-score and have r = .5, what is covariance?
  • 52. Creating the covariance matrix • One benefit of using z-scores is that the covariance matrix is the same as the correlation matrix, so if we specify the amount of correlation between variables, then we can easily make the covariance matrix that we need. • For simplicity, we’ll just assume that all our simulated variables are intercorrelated by the same amount, a value we’ll call myCorr. • So, if we have 7 variables, and myCorr is 0, we will need a matrix like this: • The script achieves this just by making a matrix where all values = myCorr, and then overwriting the diagonal with myVar (which we’ve set to 1) • N.B. The script is set to simulate uncorrelated variables, so off-diagonal values are 0, but you could experiment with other values, by changing myCorr
  • 53. Running mvrnorm Before starting, it’s a good idea to clear all variables: R does not do that automatically, and it can be problematic if you still have values of variables from an earlier session. To clear them all, click the little broom symbol on the Environment tab. Now run all the lines of the script up to and including line 51. Check the Environment tab, which will show all the variables you have created. Skip over the command on line 60 for the moment. That line starts a loop and if you try to run it, the system will hang waiting for a close curly bracket to match the open curly bracket. (You can get out of that by either hitting escape or typing a close curly bracket on the console). For now, just run line 69 mydata<-mvrnorm(n=myN, mu=rep(myM,nVar), Sigma=myCov)
  • 54. As with Excel simulation, the script generate fresh set of numbers on each run, though we can modify the settings to override this. (Google ‘setting seed’ in R) First six rows of mydata look like this:
  • 55. Now we can analyse the simulated data! Let’s look at correlations between the seven variables Pick your favourite variables by selecting two numbers between 1 and 7 Thought experiment: We’ve simulated uncorrelated variables. In a single run, how likely is it that we’ll see: • No significant correlations • Some significant correlations • A significant correlation (p < .05) between your favourite variables
  • 56. Correlation matrix for run 1 Output from simulation of 7 independent variables, where true correlation = 0 N = 30 Red denotes p < .05 ( r > .31 or < -.31);
  • 57. Correlation matrix for run 2 Output from simulation of 7 independent variables, where true correlation = 0 N = 30 Red denotes p < .05 ( r > .31 or < -.31);
  • 58. Correlation matrix for run 3 Output from simulation of 7 independent variables, where true correlation = 0 N = 30 Red denotes p < .05 ( r > .31 or < -.31); There is no relation between variables – why do we have significant values?
  • 59. Correlation matrix for run 4 Output from simulation of 7 independent variables, where true correlation = 0 N = 30 Red denotes p < .05 ( r > .31 or < -.31); On any one run, we are looking at 21 correlations. So we should use Bonferroni corrected p-value: .05/21 = .002, corresponds to r = .51
  • 60. Now try to work through the script yourself • You can run the script to generate your own table of results (it is set up just to show the table for the final run). • The bit of the script for generating tables showing significant p- values in colour is complex: don’t worry if you don’t understand it. • Most important thing is that you should develop competence to play around with the script and see how the output changes depending on how you change the sample size, the number of variables, and the true correlation between variables.
  • 61. • Use of .05 cutoff makes sense only in relation to an a-priori hypothesis Many ways in which ‘hidden multiplicity’ of testing can give false positive (p < .05) results • Data dredging from a large set of variables • Multi-way Anova with many main effects/interactions • Cramer, A. O. J., et al (2016). Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies. Psychonomic Bulletin & Review, 23(2), 640-647. doi:10.3758/s13423-015-0913-5) • Trying various analytic approaches until one ‘works’ • Post-hoc division of data into subgroups In latter 2 instances, may be hard to estimate appropriate correction – many binary choices -> multiplicative effects Key point: p-values can only be interpreted in terms of the context in which they are computed
  • 62. 1 contrast Probability of a ‘significant’ p-value < .05 = .05 Large population database used to explore link between ADHD and handedness https://figshare.com/articles/The_Garden_of_Forking_Paths/2100379 Demonstration of rapid expansion of comparisons with binary divisions
  • 63. Focus just on Young subgroup: 2 contrasts at this level Probability of a ‘significant’ p-value < .05 = .10 Large population database used to explore link between ADHD and handedness
  • 64. Focus just on Young on measure of hand skill: 4 contrasts at this level Probability of a ‘significant’ p-value < .05 = .19 Large population database used to explore link between ADHD and handedness
  • 65. Focus just on Young, Females on measure of hand skill: 8 contrasts at this level Probability of a ‘significant’ p-value < .05 = .34 Large population database used to explore link between ADHD and handedness
  • 66. Focus just on Young, Urban, Females on measure of hand skill: 16 contrasts at this level Probability of a ‘significant’ p-value < .05 = .56 Large population database used to explore link between ADHD and handedness
  • 67. 1956 De Groot Failure to distinguish between hypothesis-testing and hypothesis-generating (exploratory) research -> misuse of statistical tests de Groot, A. D. (2014). The meaning of “significance” for different types of research [translated and annotated by Eric- Jan Wagenmakers, et al]. Acta Psychologica, 148, 188-194. doi:http://dx.doi.org/10.1016/j.actpsy.2014.02.001 Further reading
  • 68. R scripts available on : https://osf.io/view/reproducibility2017/ • Simulation_ex1_intro.R Suitable for R newbies. Demonstrates ‘dance of the p-values’ in a t-test. Bonus, you learn to make pirate plots • Simulation_ex2_correlations Generate correlation matrices from multivariate normal distribution. Bonus, you learn to use ‘grid’ to make nicely formatted tabular outputs. • Simulation_ex3_multiwayAnova.R Simulate data for a 3-way mixed ANOVA. Demonstrates need to correct for N factors and interactions when doing exploratory multiway Anova. • Simulation_ex4_multipleReg.R Simulate data for multiple regression. • Simulation_ex5_falsediscovery.R Simulate data for mixture of null and true effects, to demonstrate that the probability of the data given the hypothesis is different from the probability of the hypothesis given the data. Two simulations from Daniel Lakens’ Coursera Course – with notes! • 1.1 WhichPvaluesCanYouExpect.R • 3.2 OptionalStoppingSim.R Now even more: See OSF!