SlideShare une entreprise Scribd logo
1  sur  70
Simulating data to gain insights into
experimental design and statistics
Dorothy V. M. Bishop
Professor of Developmental Neuropsychology
University of Oxford
@deevybee
Before we get started….
• I will show you some exercises for simulating data. It’s
fine if you just want to listen and learn. There are
materials online that you can work through later.
• If you would like to work along with the exercises, that
is also fine, but I won’t be able to answer many
questions. The early exercises in this lesson use
Microsoft Excel, which most people will have installed
• The later exercises use R and R studio. If you are
familiar with these and have them installed, feel free to
work along. You will need the packages yarrr, mvrnorm
and Hmisc.
Have a bright
idea Collect data
Think about
how to
analyse data
Hit problems:
Take advice
from
statistician
How most people do experiments
1890-1962
Have a
bright
idea
Simulate
data
Think
about
how to
analyse
simulated
data
If problems:
Take advice
from
statistician
Collect
real data
A better way to do experiments
Why invent data?
• If you can anticipate what your data will look like, you
will also anticipate a lot of issues about study design
that you might not have thought of
• Analysing a simulated dataset can clarify what is
optimal analysis/ how the analysis works
• Simulating data with an anticipated effect is very
useful for power analysis – deciding what sample size
to use
• Simulating data with no effect (i.e. random noise) gives
unique insights into how easy it is to get a false positive
result through p-hacking
Ways to simulate data
• For newbies: to get the general idea: Excel
• Far better but involves steeper learning curve: R
• Also (but not covered here) options in SPSS and
Matlab:
• e.g. https://www.youtube.com/watch?v=XBmvYORP5EU
• http://uk.mathworks.com/help/matlab/random-number-
generation.html
Basic idea
• Anything you measure can be seen as a
combination of an effect of interest plus random
noise
• The goal of research is to find out
• (a) whether there is an effect of interest
• (b) if yes, how big it is
• Classic hypothesis-testing with p-values is simply
focuses just on (a) – i.e. have we just got noise or
a real effect?
• We can simulate most scenarios by generating
random noise, with or without a consistent added
effect
Basic idea: generate a set of random numbers in Excel
• Open a new workbook
• In cell A1 type random number
• In cell A2 type = rand()
Grab the little
square in the
bottom right of A2
and pull it down to
autofill the cells
below to A8
Random numbers in Excel, ctd
• You have just simulated
some data!
• Are your numbers the
same as mine?
• What happens when
you type rand() in
A9?
Random numbers in Excel, ctd.
• Your numbers will be different to mine – that’s because they
are random.
• The numbers will change whenever you open the worksheet,
or make any change to it.
• Sometimes that’s fine, but for this demo we want to keep
the same numbers. To control when random numbers
update, select Manual in Formula|Calculation Options.
• To update to new numbers use Calculate Now button.
Remember to
reset to
Automatic
afterwards!
Random numbers in Excel, ctd.
• The rand() function generates random numbers between 0 and 1:
Are these the kind of numbers
we want?
Realistic data usually involves normally distributed numbers
• Nifty way to do this in Excel: treat generated numbers as p-values
• The normsinv() function turns a p-value into a z-score
Z-score
Normally distributed random numbers
Try this:
• Type = normsinv(A2) in
cell B2
• Drag formula down to
cell B8
• Now look at how the
numbers in column A
relate to those in
column B.
NB. In practice, we can generate normally distributed random numbers
(i.e. z-scores) in just one step with formula: = normsinv(rand())
Now we are ready to simulate a study where we have
2 groups to be compared on a t-test
• Pull down the
formula from
columns A
and B to
extend to
A11:B11
• Type a header
‘group’ in C1
• Type 1 in
C2:C6 and 2
in C7:C11
What is formula for t-test in Excel?
Basic rule for life, especially in programming: if you don’t know it,
Google it
TTEST formula in xls:
You specify:
Range 1
Range 2
tails (1 or 2)
type
1 = paired
2 = unpaired equal variance
3 = unpaired unequal variance
Try entering the formula for the t-test in C12
=TTEST(B2:B6,
B7:B11,2,2)
What is the number
that you get?
This formula gives
you a p-value
Now press
‘calculate now’ 20
times, and keep a
tally of how many
p-values are < .05 in
20 simulations
• What has this shown you?
• P-values ‘dance about’ even when data are entirely random
• On average, one in 20 runs will give p < .05 when null
hypothesis is true – no difference between groups
• Doesn’t mean you get EXACTLY 1 in 20 p-values < .05: need
a long run to converge on that value.
See Geoff Cumming: Dance of the p-values
https://www.youtube.com/watch?v=5OL1RqHrZQ8
Congratulations! You have done your first simulation
We’ll stick with Excel for one more simulation
• So far, we’ve simulated the null hypothesis - random
data. If we find a ‘significant’ difference, we know it’s a
false positive
• Next, we’ll simulate data with a genuine effect.
• It’s easy to do this: we just add a constant to all the
values for group 2
• Since we’re using z-scores, the constant will correspond
to the effect size (expressed as Cohen’s d).
• Let’s try an effect size of .5
• For cells B7, change the formula to = normsinv(A7)+.5
• Drag the formula down to cell B11 and hit ‘Calculate
now’
I’ve added formulae to
show the mean and SD for
the two groups:
= AVERAGE(B2:B6)
= STDEV(B2:B6)
= AVERAGE(B7:B11)
= STDEV(B7:B11)
Your values will differ.
Why isn’t the difference in
means for the two groups
exactly .5?
I’ve added formulae to
show the mean and SD for
the two groups:
= AVERAGE(B2:B6)
= STDEV(B2:B6)
= AVERAGE(B7:B11)
= STDEV(B7:B11)
Your values will differ.
Why isn’t the difference in
means for the two groups
exactly .5?
ANSWER: mean/SD
describe the population;
this is just a sample from
that population
Now type the
formula for the t-test
=TTEST(B2:B6,B
7:B11,2,2)
Is p < .05 ?
It’s pretty unlikely
you will see a
significant result.
Why?
It’s pretty unlikely
you will see a
significant result.
Why?
ANSWER: Sample too
small – can’t pick out
signal from noise
• The first simulation gave some insights into false positive
rates: it shows how you can get a ‘significant’ result from
random data
• The second simulation illustrates the opposite situation:
showing how often you can fail to get a significant p-value,
even when there is a true effect (false negative)
• This brings us on to the topic of statistical power: the
probability of detecting a real effect with a given sample size
• To build on these insights we need to do lots of simulations,
and for that it’s best to move to R
What have we learned so far?
Fire up R studio
Console: try commands out here Environment:
check variables here
Cursor: console is ready
for you to type here
At the cursor type:
scoresA <- rnorm(n = 5, m = 0, sd = 1)
• This creates a vector of z-scores (i.e. random normal
deviates with mean of 0 and SD of 1)
• But where is it?
• To see the numbers you can either look in the
Environment pane (top right) and/or just type the
vector’s name at the cursor
scoresA
[1] -0.15348659 0.01984155 0.18353508
0.23524739 1.18143805
Blue courier: what you
type at cursor. Black
courier, output at cursor
rnorm is an inbuilt R
function that generates
random normal deviates
We’ll now create another vector for group B. Same command but
we’ll make scores for group B an average .5 points higher:
scoresB <- rnorm(n = 5, m = 0.5, sd = 1)
You can inspect this as before: type its name at the console.
Now we can do a t-test
t.test(scoresA,scoresB)
Welch Two Sample t-test
data: scoresA and scoresB
t = -1.502, df = 5.8215, p-value = 0.1853
alternative hypothesis: true difference in
means is not equal to 0
95 percent confidence interval:
-2.0909662 0.5076982
sample estimates:
mean of x mean of y
0.2933151 1.0849491
• Console shows results for a
Welch 2-sample t-test (i.e.
t-test with correction for
unequal variances)
We’ll now do exactly the same thing, but with N of 50 per group
scoresA <- rnorm(n = 50, m = 0, sd = 1)
scoresB <- rnorm(n = 50, m = 0.5, sd = 1)
t.test(scoresA,scoresB)
Welch Two Sample t-test
data: scoresA and scoresB
t = -2.6022, df = 94.313, p-value = 0.01076
alternative hypothesis: true difference in
means is not equal to 0
95 percent confidence interval:
-0.9723062 -0.1307207
sample estimates:
mean of x mean of y
0.1208312 0.6723447
Benefits of simulating data in R
• Much faster than Excel, and reproducible
• Can generate different distributions, correlated variables, etc.
• Powerful plotting functions
• A good way of starting to learn R
• Can write a script that executes commands to generate data
and then run it automatically many times with different
parameters (e.g. N and effect size) and store results
Downside: Steep initial learning curve
But remember: Google is your friend
Tons of material about R on the internet
Self-teaching scripts on https://osf.io/skz3j/
Download, save and open this one:
Simulation_ex1_multioutput.R
Source pane: script
Console: window moves down
when we open a script file
First thing to do: Set working directory
• Working directory is where R will default to when reading and
writing stuff
• Easiest way to set it: Go to Session|Set working directory
Note that when you do this, the command to set working directory will pop up on the
console. On my computer I see:
setwd("~/deevybee_repo")
Take note of location of Run button
Simulation_ex1_multioutput.R
This repeatedly runs the steps you put into the console, plots the results and saves
the plots in a pdf:
• There are some additional steps to reorganize the numbers: for an explanation of
the details please see Simulation_ex1_intro.R
• You run the simulation repeatedly, with two different values for N
The structure of the script is with 2 nested loops:
for (i in 1:2){ #line 15
……… #various commands here
for (j in 1:10){ #line 21
……… #various commands here
}
}
• The outer loop runs twice; the inner loop, which is nested inside it, runs 10 times.
So overall there are 20 runs
• The value,i,in the outer loop, controls sample size which is either myNs[1] or
myNs[2]
• The value, j, in the inner loop just acts as a counter, to give 10 repetitions
Let’s run the whole script!
• Select all the code in the source (upper left-hand pane) by
clicking in that pane and then typing Ctrl+A or Command+A
• Now hit the Run button on the menu bar to run the script
• Click on the Files tab in the bottom right-hand pane, and
you’ll see you have created two new pdf files (you may
need to scroll down to see them):
10 runs of simulation with N = 20 per group and effect size (d) = .5
t = −1.8; p = 0.0767
Group
Score
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −3.4; p = 0.0018
Group
Score
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −0.48; p = 0.637
Group
Score
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −1.4; p = 0.165
Group
Score
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −1.4; p = 0.164
Group
Score
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = 0.044; p = 0.965
Group
Score
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −1.9; p = 0.0638
Group
Score
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −0.86; p = 0.394
Group
Score
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −2.6; p = 0.0139
Group
Score
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −2.3; p = 0.0256
Group
Score
−1.5
−1
−0.5
0
0.5
1
1.5
2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
**
* *
10 runs of simulation with N = 100 per group and effect size (d) = .5
t = −2.9; p = 0.00396
Group
Score
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −4.4; p = 0.0000159
Group
Score
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −3.5; p = 0.000486
Group
Score
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −2; p = 0.0417
Group
Score
−4
−3
−2
−1
0
1
2
3
4
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
1 2
t = −3.9; p = 0.000137
Group
Score
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −3.4; p = 0.00084
Group
Score
−4
−3
−2
−1
0
1
2
3
4
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −4.7; p = 0.00000539
Group
Score
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −2.9; p = 0.00463
Group
Score
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −3.8; p = 0.000218
Group
Score
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −3.3; p = 0.00117
Group
Score
−3
−2
−1
0
1
2
3
4
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
1 2
*****
****
**** ***
*********
10 runs of simulation with N = 100 per group and effect size (d) = .3
t = −2.9; p = 0.00406
Group
Score
−4
−3
−2
−1
0
1
2
3
4
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −1.5; p = 0.128
Group
Score
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −0.93; p = 0.354
Group
Score
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
1 2
t = −1.2; p = 0.242
Group
Score
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
1 2
t = −2.6; p = 0.00932
Group
Score
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
1 2
t = −2.9; p = 0.00463
Group
Score
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −0.63; p = 0.529
Group
Score
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −2.9; p = 0.00443
Group
Score
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −2.6; p = 0.011
Group
Score
−3.5
−2.5
−1.5
−0.5
0.5
1.5
2.5
3.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2
t = −1.4; p = 0.151
Group
Score
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
1 2
** **
** ** *
Points to note
• Smaller samples associated with more variable results.
• With small sample sizes, true but weak effects will usually
not give you a ‘significant’ result (i.e. p < .05).
• In the example here, with effect size of .3, sample of 100
per group only gives a significant result on around 60% of
runs (when we do many runs of simulation).
• This is the same as saying the power of the study to
detect an effect size of .3 is equal to .60%
• Many statisticians recommend power should be 80% or
more (though will depend on purpose of study).
Body of table show sample size per group
Jacob Cohen worked this all out in 1988
Estimating statistical power for your study
You can compute power without needing to simulate: For
simple designs can use G-power package (or Cohen’s
formulae)
But simulation gives more insight into what power means. It
is also more flexible: can use with complex datasets and
analytic methods. Simulate data, run the analysis 10,000
times and then see how frequently your result is ‘significant’
by whatever criterion you plan to use.
This requires you to have a sense of what your data will look
like, and you have to have an estimate of what is the
smallest effect size that you’d be interested in.
“Small studies continue to be carried out
with little more than a blind hope of
showing the desired effect. Nevertheless,
papers based on such work are submitted
for publication, especially if the results
turn out to be statistically significant.”
Weak statistical power has been, and continues to be a
major cause of problems with replication of findings
1987
Newcombe
Low power plagues much research in
biomedical science and psychology
What can be done?!
• Take steps to improve effect size: minimize noise
Use better measures – check they are reliable
Take more samples of dependent variable – e.g. more
trials
• Think hard about experimental design – simulate different
possibilities
E.g. Sometimes a within-subjects design is more sensitive
• Work collaboratively to increase sample size
Within-subjects vs between-subjects design:
Matched pairs vs. independent t-test
• See simulation_ex1a_withinsubs.R
If some of the noise reflects consistent attribute of subjects, then testing 20 people
twice more powerful than testing 2 groups of 20.
t = −2.8; p = 0.0115
Difference
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
−2−101234
*
t = −4.1; p = 0.000678
Difference
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
−2−101234
***
t = −1.4; p = 0.167
Difference
●
●●
●
● ●●●
●
●
●
●
●
●
●
●
●●
●
●
−2−101234
t = −5.2; p = 0.0000558
Difference
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
−2−101234
***
t = −3.3; p = 0.00337
Difference
●
●
●
●
●
●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
−2−101234
**
t = −3.4; p = 0.00296
Difference
●
●
●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
−2−101234
**
t = −2.4; p = 0.026
Difference
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2−101234
*
t = −2.5; p = 0.0207
Difference
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ●●
−2−101234
*
t = −2; p = 0.0564
Difference
●●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
−2−101234
t = −2.7; p = 0.0152
Difference
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
−2−101234
*
Difference scores pre-post treatment, N = 20: effect size = .5, correlation time1/2 =.5
See: DeclareDesignIntro on https://github.com/oscci/simulate_designs
Also R package: simstudy – simulate datasets with different properties,
including multilevel data
Low power plagues much research in
biomedical science and psychology
What can be done?!
• Work collaboratively to increase sample size
https://psysciacc.org/
Nature 561, 287 (2018)
doi: 10.1038/d41586-018-06692-8
Part 2: Simulating null results to illustrate p-hacking
P-hacking and type 1 error (false positives)
Simulation_ex2_correlations.R
Often studies have multiple variables of interest.
This script uses the mvrnorm function from the MASS
package to simulate multivariate normal data
It also demonstrates the dangers of p-hacking by showing
how easy it is to get some values with p < .05 if you have a
large selection of variables
Thought experiment: we’ll simulate 7 uncorrelated variables.
In a single run, how likely is it that we’ll see:
• No significant correlations
• Some significant correlations
Suppose you make a specific prediction in advance that your
two favourite variables (e.g. V1 and V3) will be significantly
correlated: what’s the probability you will be correct?
Correlation matrix for run 1
Output from simulation of 7 independent variables, where true correlation = 0
N = 30
Red denotes p < .05 ( r > .31 or < -.31);
Sample size not
relevant for this
demonstration:
With larger N,
smaller r will be
significant at .05
Correlation matrix for run 2
Output from simulation of 7 independent variables, where true correlation = 0
N = 30
Red denotes p < .05 ( r > .31 or < -.31);
Why do we get significant values when we have specified true r = 0 ?
Correlation matrix for run 3
Output from simulation of 7 independent variables, where true correlation = 0
N = 30
Red denotes p < .05 ( r > .31 or < -.31);
On any one run, we are looking at 21 correlations.
So we should use Bonferroni corrected p-value: .05/21 = .002,
corresponds to r = .51
• Use of .05 cutoff makes sense only in relation to an a-priori
hypothesis
Focusing just on ‘significant’ associations in a dataset is classic p-
hacking – also known as ‘data dredging’
It is very commonly done, and many people fail to appreciate how
misleading it is.
It’s fine to look for patterns in complex data as a way of exploring
and deriving a hypothesis, but it must then be tested in another
sample.
Consider: we saw particular patterns in our random noise data –
but they did not replicate in another run.
Key point: p-values can only be interpreted in terms of the context
in which they are computed
• Multi-way Anova with many main effects/interactions
• Cramer, A. O. J., et al (2016). Hidden multiplicity in exploratory multiway ANOVA:
Prevalence and remedies. Psychonomic Bulletin & Review, 23(2), 640-647.
doi:10.3758/s13423-015-0913-5)
Other ways in which ‘hidden multiplicity’ of testing
can give false positive (p < .05) results
Illustrated with field of ERP/EEG
• Flexibility in analysis in terms of:
• Electrodes
• Time intervals
• Frequency ranges
• Measurement of peaks
• etc, etc
• Often see analyses with 4- or 5-way ANOVA (group x side x
site x condition x interval)
• Standard stats packages correct p-values for N levels
WITHIN a factor, but not for overall N factors and
interactions
.
Cramer AOJ, et al 2016. Hidden multiplicity in exploratory multiway ANOVA: Prevalence and
remedies. Psychonomic Bulletin & Review 23:640-647
• Subgroup analysis
Other ways in which ‘hidden multiplicity’ of testing
can give false positive (p < .05) results
You run a study investigating how a drug, X, affects
anxiety. You plot the results by age, and see this:
No significant effect of X on anxiety overall
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
16 20 24 28 32 36 40 44 48 52 56 60
Symptomimprovement
Age (yr)
Treatment effect by age
But you notice that there is a treatment effect for
those aged over 36
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
16 20 24 28 32 36 40 44 48 52 56 60
Symptomimprovement
Age (yr)
Treatment effect by age
Close link between p-hacking and HARKing
You are HARKing if you have no prior predictions, but on seeing results you write up paper as
if you planned to look at effect of age on drug effect.
This kind of thing is endemic in psychology.
• It is OK to say that this association was observed in exploratory analysis, and that it
suggests a new hypothesis that needs to be tested in a new sample.
• It is NOT OK to pretend that you predicted the association if you didn’t.
• And it is REALLY REALLY NOT OK to report only the data that support your new hypothesis
(e.g. dropping those aged below 36 from the analysis)
-1
-0.5
0
0.5
1
16 20 24 28 32 36 40 44 48 52 56 60
Symptom
improvement
Age (yr)
Treatment effect by age
• Analytic flexibility affects not just subgroups, but also selection
of measures, type of analysis, removal of outliers, etc.
• ‘Garden of forking paths’
• In many cases, hard to apply any
statistical correction, because we
are unaware of all the potential
analyses
The problem: analytic flexibility that allows analysis to be
Influenced by the results
"El jardín de senderos que se bifurcan"
1 contrast
Probability of a
‘significant’ p-value
< .05 = .05
Large population
database used to explore
link between ADHD and
handedness
https://figshare.com/articles/The_Garden_of_Forking_Paths/2100379
Demonstration of rapid expansion of comparisons with binary divisions
Focus just on Young
subgroup:
2 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .10
Large population
database used to explore
link between ADHD and
handedness
Focus just on Young on
measure of hand skill:
4 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .19
Large population
database used to explore
link between ADHD and
handedness
Focus just on Young,
Females on
measure of hand skill:
8 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .34
Large population
database used to explore
link between ADHD and
handedness
Focus just on Young,
Urban, Females on
measure of hand skill:
16 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .56
Large population
database used to explore
link between ADHD and
handedness
Richard Peto: ISIS-2 study group (1988) Lancet 332, 349-410
1956
De Groot
Failure to distinguish between
hypothesis-testing and
hypothesis-generating
(exploratory) research
-> misuse of statistical tests
de Groot, A. D. (2014). The meaning of “significance” for
different types of research [translated and annotated by Eric-
Jan Wagenmakers, et al]. Acta Psychologica, 148, 188-194.
doi:http://dx.doi.org/10.1016/j.actpsy.2014.02.001
Further reading
A comprehensive solution: Pre-registration
Some general points to help you learn R
1. Basic rule for life, especially in programming: if you don’t know
it, Google it
In R, Google your error message
2. Best way to learn is by making mistakes
If you see a line of code you don’t understand, play with it to find
out what it does.
Look at Environment tab, or type name of variable on the console
to check its value
Don’t be afraid to experiment; E.g., you want repeating
numbers? Type in the console to compare: rep (1,3) and
R scripts available on : https://osf.io/view/reproducibility2017/
• Simulation_ex1_intro.R
Suitable for R newbies. Demonstrates ‘dance of the p-values’ in a t-test.
Bonus, you learn to make pirate plots
• Simulation_ex2_correlations
Generate correlation matrices from multivariate normal distribution.
Bonus, you learn to use ‘grid’ to make nicely formatted tabular outputs.
• Simulation_ex3_multiwayAnova.R
Simulate data for a 3-way mixed ANOVA. Demonstrates need to correct
for N factors and interactions when doing exploratory multiway Anova.
• Simulation_ex4_multipleReg.R
Simulate data for multiple regression.
• Simulation_ex5_falsediscovery.R
Simulate data for mixture of null and true effects, to demonstrate that
the probability of the data given the hypothesis is different from the
probability of the hypothesis given the data.
Two simulations from Daniel Lakens’ Coursera Course – with notes!
• 1.1 WhichPvaluesCanYouExpect.R
• 3.2 OptionalStoppingSim.R
Now even
more: See
OSF!

Contenu connexe

Tendances

Essential training on microsoft office power point 2007
Essential training on microsoft office power point 2007Essential training on microsoft office power point 2007
Essential training on microsoft office power point 2007
ashok_142
 
Object oriented analysis and design
Object oriented analysis and designObject oriented analysis and design
Object oriented analysis and design
naveed428
 
Creating & Editing Charts In Microsoft Excel 2003
Creating & Editing Charts In Microsoft Excel 2003Creating & Editing Charts In Microsoft Excel 2003
Creating & Editing Charts In Microsoft Excel 2003
bud_00
 

Tendances (20)

Rbootcamp Day 1
Rbootcamp Day 1Rbootcamp Day 1
Rbootcamp Day 1
 
Essential training on microsoft office power point 2007
Essential training on microsoft office power point 2007Essential training on microsoft office power point 2007
Essential training on microsoft office power point 2007
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in Python
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
 
Linear time sorting algorithms
Linear time sorting algorithmsLinear time sorting algorithms
Linear time sorting algorithms
 
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing notes data structures (HASHING AND HASH FUNCTIONS)Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
 
IMAGE FILE FORMATS
IMAGE FILE FORMATSIMAGE FILE FORMATS
IMAGE FILE FORMATS
 
Object oriented analysis and design
Object oriented analysis and designObject oriented analysis and design
Object oriented analysis and design
 
Design techniques
Design techniquesDesign techniques
Design techniques
 
Scaling
ScalingScaling
Scaling
 
Adobe Photoshop
Adobe Photoshop Adobe Photoshop
Adobe Photoshop
 
Creating R Packages
Creating R PackagesCreating R Packages
Creating R Packages
 
Research 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeXResearch 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeX
 
Computer graphics presentation
Computer graphics presentationComputer graphics presentation
Computer graphics presentation
 
software Engineering process
software Engineering processsoftware Engineering process
software Engineering process
 
Scaling and shearing
Scaling and shearingScaling and shearing
Scaling and shearing
 
Creating & Editing Charts In Microsoft Excel 2003
Creating & Editing Charts In Microsoft Excel 2003Creating & Editing Charts In Microsoft Excel 2003
Creating & Editing Charts In Microsoft Excel 2003
 
Presentation
PresentationPresentation
Presentation
 
Lab manual asp.net
Lab manual asp.netLab manual asp.net
Lab manual asp.net
 
Introduction to the Python
Introduction to the PythonIntroduction to the Python
Introduction to the Python
 

Similaire à Data simulation basics

Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
darwinrlo
 

Similaire à Data simulation basics (20)

Simulating data to gain insights into power and p-hacking
Simulating data to gain insights intopower and p-hackingSimulating data to gain insights intopower and p-hacking
Simulating data to gain insights into power and p-hacking
 
Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 
Visual Techniques
Visual TechniquesVisual Techniques
Visual Techniques
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConf
 
Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4
 
Engineering Numerical Analysis-Introduction.pdf
Engineering Numerical Analysis-Introduction.pdfEngineering Numerical Analysis-Introduction.pdf
Engineering Numerical Analysis-Introduction.pdf
 
Design and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptDesign and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.ppt
 
Exploratory data analysis v1.0
Exploratory data analysis v1.0Exploratory data analysis v1.0
Exploratory data analysis v1.0
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
 
@elemorfaruk
@elemorfaruk@elemorfaruk
@elemorfaruk
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
De vry math221 all ilabs latest 2016 november
De vry math221 all ilabs latest 2016 novemberDe vry math221 all ilabs latest 2016 november
De vry math221 all ilabs latest 2016 november
 
De vry math 221 all ilabs latest 2016 november
De vry math 221 all ilabs latest 2016 novemberDe vry math 221 all ilabs latest 2016 november
De vry math 221 all ilabs latest 2016 november
 
Algorithm Homework Help
Algorithm Homework HelpAlgorithm Homework Help
Algorithm Homework Help
 
Data Structures- Part1 overview and review
Data Structures- Part1 overview and reviewData Structures- Part1 overview and review
Data Structures- Part1 overview and review
 
A01
A01A01
A01
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
 
Elementary Data Analysis with MS Excel_Day-5
Elementary Data Analysis with MS Excel_Day-5Elementary Data Analysis with MS Excel_Day-5
Elementary Data Analysis with MS Excel_Day-5
 

Plus de Dorothy Bishop

Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.
Dorothy Bishop
 

Plus de Dorothy Bishop (20)

Exercise/fish oil intervention for dyslexia
Exercise/fish oil intervention for dyslexiaExercise/fish oil intervention for dyslexia
Exercise/fish oil intervention for dyslexia
 
Open Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill PandemicOpen Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill Pandemic
 
Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.
 
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save lives
 
Short talk on 2 cognitive biases and reproducibility
Short talk on 2 cognitive biases and reproducibilityShort talk on 2 cognitive biases and reproducibility
Short talk on 2 cognitive biases and reproducibility
 
Otitis media with effusion: an illustration of ascertainment bias
Otitis media with effusion: an illustration of ascertainment biasOtitis media with effusion: an illustration of ascertainment bias
Otitis media with effusion: an illustration of ascertainment bias
 
Insights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibilityInsights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibility
 
What are metrics good for? Reflections on REF and TEF
What are metrics good for? Reflections on REF and TEFWhat are metrics good for? Reflections on REF and TEF
What are metrics good for? Reflections on REF and TEF
 
Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
 
What is Developmental Language Disorder
What is Developmental Language DisorderWhat is Developmental Language Disorder
What is Developmental Language Disorder
 
Developmental language disorder and auditory processing disorder: 
Same or di...
Developmental language disorder and auditory processing disorder: 
Same or di...Developmental language disorder and auditory processing disorder: 
Same or di...
Developmental language disorder and auditory processing disorder: 
Same or di...
 
Fallibility in science: Responsible ways to handle mistakes
Fallibility in science: Responsible ways to handle mistakesFallibility in science: Responsible ways to handle mistakes
Fallibility in science: Responsible ways to handle mistakes
 
Improve your study with pre-registration
Improve your study with pre-registrationImprove your study with pre-registration
Improve your study with pre-registration
 
Southampton: lecture on TEF
Southampton: lecture on TEFSouthampton: lecture on TEF
Southampton: lecture on TEF
 
Reading list: What’s wrong with our universities
Reading list: What’s wrong with our universitiesReading list: What’s wrong with our universities
Reading list: What’s wrong with our universities
 
IJLCD Winter Lecture 2016-7 : References
IJLCD Winter Lecture 2016-7 : ReferencesIJLCD Winter Lecture 2016-7 : References
IJLCD Winter Lecture 2016-7 : References
 
What's wrong with our Universities, and will the Teaching Excellence Framewor...
What's wrong with our Universities, and will the Teaching Excellence Framewor...What's wrong with our Universities, and will the Teaching Excellence Framewor...
What's wrong with our Universities, and will the Teaching Excellence Framewor...
 
Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Bishop reproducibility references nov2016
Bishop reproducibility references nov2016
 
On the importance of WIPS not being wimps
On the importance of WIPS not being wimpsOn the importance of WIPS not being wimps
On the importance of WIPS not being wimps
 

Dernier

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
Bhagirath Gogikar
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Dernier (20)

Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 

Data simulation basics

  • 1. Simulating data to gain insights into experimental design and statistics Dorothy V. M. Bishop Professor of Developmental Neuropsychology University of Oxford @deevybee
  • 2. Before we get started…. • I will show you some exercises for simulating data. It’s fine if you just want to listen and learn. There are materials online that you can work through later. • If you would like to work along with the exercises, that is also fine, but I won’t be able to answer many questions. The early exercises in this lesson use Microsoft Excel, which most people will have installed • The later exercises use R and R studio. If you are familiar with these and have them installed, feel free to work along. You will need the packages yarrr, mvrnorm and Hmisc.
  • 3. Have a bright idea Collect data Think about how to analyse data Hit problems: Take advice from statistician How most people do experiments
  • 5. Have a bright idea Simulate data Think about how to analyse simulated data If problems: Take advice from statistician Collect real data A better way to do experiments
  • 6. Why invent data? • If you can anticipate what your data will look like, you will also anticipate a lot of issues about study design that you might not have thought of • Analysing a simulated dataset can clarify what is optimal analysis/ how the analysis works • Simulating data with an anticipated effect is very useful for power analysis – deciding what sample size to use • Simulating data with no effect (i.e. random noise) gives unique insights into how easy it is to get a false positive result through p-hacking
  • 7. Ways to simulate data • For newbies: to get the general idea: Excel • Far better but involves steeper learning curve: R • Also (but not covered here) options in SPSS and Matlab: • e.g. https://www.youtube.com/watch?v=XBmvYORP5EU • http://uk.mathworks.com/help/matlab/random-number- generation.html
  • 8. Basic idea • Anything you measure can be seen as a combination of an effect of interest plus random noise • The goal of research is to find out • (a) whether there is an effect of interest • (b) if yes, how big it is • Classic hypothesis-testing with p-values is simply focuses just on (a) – i.e. have we just got noise or a real effect? • We can simulate most scenarios by generating random noise, with or without a consistent added effect
  • 9. Basic idea: generate a set of random numbers in Excel • Open a new workbook • In cell A1 type random number • In cell A2 type = rand() Grab the little square in the bottom right of A2 and pull it down to autofill the cells below to A8
  • 10. Random numbers in Excel, ctd • You have just simulated some data! • Are your numbers the same as mine? • What happens when you type rand() in A9?
  • 11. Random numbers in Excel, ctd. • Your numbers will be different to mine – that’s because they are random. • The numbers will change whenever you open the worksheet, or make any change to it. • Sometimes that’s fine, but for this demo we want to keep the same numbers. To control when random numbers update, select Manual in Formula|Calculation Options. • To update to new numbers use Calculate Now button. Remember to reset to Automatic afterwards!
  • 12. Random numbers in Excel, ctd. • The rand() function generates random numbers between 0 and 1: Are these the kind of numbers we want?
  • 13. Realistic data usually involves normally distributed numbers • Nifty way to do this in Excel: treat generated numbers as p-values • The normsinv() function turns a p-value into a z-score Z-score
  • 14. Normally distributed random numbers Try this: • Type = normsinv(A2) in cell B2 • Drag formula down to cell B8 • Now look at how the numbers in column A relate to those in column B. NB. In practice, we can generate normally distributed random numbers (i.e. z-scores) in just one step with formula: = normsinv(rand())
  • 15. Now we are ready to simulate a study where we have 2 groups to be compared on a t-test • Pull down the formula from columns A and B to extend to A11:B11 • Type a header ‘group’ in C1 • Type 1 in C2:C6 and 2 in C7:C11
  • 16. What is formula for t-test in Excel? Basic rule for life, especially in programming: if you don’t know it, Google it TTEST formula in xls: You specify: Range 1 Range 2 tails (1 or 2) type 1 = paired 2 = unpaired equal variance 3 = unpaired unequal variance
  • 17. Try entering the formula for the t-test in C12 =TTEST(B2:B6, B7:B11,2,2) What is the number that you get? This formula gives you a p-value Now press ‘calculate now’ 20 times, and keep a tally of how many p-values are < .05 in 20 simulations
  • 18. • What has this shown you? • P-values ‘dance about’ even when data are entirely random • On average, one in 20 runs will give p < .05 when null hypothesis is true – no difference between groups • Doesn’t mean you get EXACTLY 1 in 20 p-values < .05: need a long run to converge on that value. See Geoff Cumming: Dance of the p-values https://www.youtube.com/watch?v=5OL1RqHrZQ8 Congratulations! You have done your first simulation
  • 19. We’ll stick with Excel for one more simulation • So far, we’ve simulated the null hypothesis - random data. If we find a ‘significant’ difference, we know it’s a false positive • Next, we’ll simulate data with a genuine effect. • It’s easy to do this: we just add a constant to all the values for group 2 • Since we’re using z-scores, the constant will correspond to the effect size (expressed as Cohen’s d). • Let’s try an effect size of .5 • For cells B7, change the formula to = normsinv(A7)+.5 • Drag the formula down to cell B11 and hit ‘Calculate now’
  • 20. I’ve added formulae to show the mean and SD for the two groups: = AVERAGE(B2:B6) = STDEV(B2:B6) = AVERAGE(B7:B11) = STDEV(B7:B11) Your values will differ. Why isn’t the difference in means for the two groups exactly .5?
  • 21. I’ve added formulae to show the mean and SD for the two groups: = AVERAGE(B2:B6) = STDEV(B2:B6) = AVERAGE(B7:B11) = STDEV(B7:B11) Your values will differ. Why isn’t the difference in means for the two groups exactly .5? ANSWER: mean/SD describe the population; this is just a sample from that population
  • 22. Now type the formula for the t-test =TTEST(B2:B6,B 7:B11,2,2) Is p < .05 ? It’s pretty unlikely you will see a significant result. Why?
  • 23. It’s pretty unlikely you will see a significant result. Why? ANSWER: Sample too small – can’t pick out signal from noise
  • 24. • The first simulation gave some insights into false positive rates: it shows how you can get a ‘significant’ result from random data • The second simulation illustrates the opposite situation: showing how often you can fail to get a significant p-value, even when there is a true effect (false negative) • This brings us on to the topic of statistical power: the probability of detecting a real effect with a given sample size • To build on these insights we need to do lots of simulations, and for that it’s best to move to R What have we learned so far?
  • 25. Fire up R studio Console: try commands out here Environment: check variables here Cursor: console is ready for you to type here
  • 26. At the cursor type: scoresA <- rnorm(n = 5, m = 0, sd = 1) • This creates a vector of z-scores (i.e. random normal deviates with mean of 0 and SD of 1) • But where is it? • To see the numbers you can either look in the Environment pane (top right) and/or just type the vector’s name at the cursor scoresA [1] -0.15348659 0.01984155 0.18353508 0.23524739 1.18143805 Blue courier: what you type at cursor. Black courier, output at cursor rnorm is an inbuilt R function that generates random normal deviates
  • 27. We’ll now create another vector for group B. Same command but we’ll make scores for group B an average .5 points higher: scoresB <- rnorm(n = 5, m = 0.5, sd = 1) You can inspect this as before: type its name at the console. Now we can do a t-test t.test(scoresA,scoresB) Welch Two Sample t-test data: scoresA and scoresB t = -1.502, df = 5.8215, p-value = 0.1853 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.0909662 0.5076982 sample estimates: mean of x mean of y 0.2933151 1.0849491 • Console shows results for a Welch 2-sample t-test (i.e. t-test with correction for unequal variances)
  • 28. We’ll now do exactly the same thing, but with N of 50 per group scoresA <- rnorm(n = 50, m = 0, sd = 1) scoresB <- rnorm(n = 50, m = 0.5, sd = 1) t.test(scoresA,scoresB) Welch Two Sample t-test data: scoresA and scoresB t = -2.6022, df = 94.313, p-value = 0.01076 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.9723062 -0.1307207 sample estimates: mean of x mean of y 0.1208312 0.6723447
  • 29. Benefits of simulating data in R • Much faster than Excel, and reproducible • Can generate different distributions, correlated variables, etc. • Powerful plotting functions • A good way of starting to learn R • Can write a script that executes commands to generate data and then run it automatically many times with different parameters (e.g. N and effect size) and store results Downside: Steep initial learning curve But remember: Google is your friend Tons of material about R on the internet
  • 30. Self-teaching scripts on https://osf.io/skz3j/ Download, save and open this one: Simulation_ex1_multioutput.R Source pane: script Console: window moves down when we open a script file
  • 31. First thing to do: Set working directory • Working directory is where R will default to when reading and writing stuff • Easiest way to set it: Go to Session|Set working directory Note that when you do this, the command to set working directory will pop up on the console. On my computer I see: setwd("~/deevybee_repo")
  • 32. Take note of location of Run button
  • 33. Simulation_ex1_multioutput.R This repeatedly runs the steps you put into the console, plots the results and saves the plots in a pdf: • There are some additional steps to reorganize the numbers: for an explanation of the details please see Simulation_ex1_intro.R • You run the simulation repeatedly, with two different values for N The structure of the script is with 2 nested loops: for (i in 1:2){ #line 15 ……… #various commands here for (j in 1:10){ #line 21 ……… #various commands here } } • The outer loop runs twice; the inner loop, which is nested inside it, runs 10 times. So overall there are 20 runs • The value,i,in the outer loop, controls sample size which is either myNs[1] or myNs[2] • The value, j, in the inner loop just acts as a counter, to give 10 repetitions
  • 34. Let’s run the whole script! • Select all the code in the source (upper left-hand pane) by clicking in that pane and then typing Ctrl+A or Command+A • Now hit the Run button on the menu bar to run the script • Click on the Files tab in the bottom right-hand pane, and you’ll see you have created two new pdf files (you may need to scroll down to see them):
  • 35. 10 runs of simulation with N = 20 per group and effect size (d) = .5 t = −1.8; p = 0.0767 Group Score −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.4; p = 0.0018 Group Score −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −0.48; p = 0.637 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −1.4; p = 0.165 Group Score −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −1.4; p = 0.164 Group Score −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = 0.044; p = 0.965 Group Score −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −1.9; p = 0.0638 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −0.86; p = 0.394 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2.6; p = 0.0139 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2.3; p = 0.0256 Group Score −1.5 −1 −0.5 0 0.5 1 1.5 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 ** * *
  • 36. 10 runs of simulation with N = 100 per group and effect size (d) = .5 t = −2.9; p = 0.00396 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −4.4; p = 0.0000159 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.5; p = 0.000486 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2; p = 0.0417 Group Score −4 −3 −2 −1 0 1 2 3 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.9; p = 0.000137 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.4; p = 0.00084 Group Score −4 −3 −2 −1 0 1 2 3 4 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −4.7; p = 0.00000539 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2.9; p = 0.00463 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.8; p = 0.000218 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.3; p = 0.00117 Group Score −3 −2 −1 0 1 2 3 4 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 1 2 ***** **** **** *** *********
  • 37. 10 runs of simulation with N = 100 per group and effect size (d) = .3 t = −2.9; p = 0.00406 Group Score −4 −3 −2 −1 0 1 2 3 4 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −1.5; p = 0.128 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −0.93; p = 0.354 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 1 2 t = −1.2; p = 0.242 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 1 2 t = −2.6; p = 0.00932 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● 1 2 t = −2.9; p = 0.00463 Group Score −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −0.63; p = 0.529 Group Score −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2.9; p = 0.00443 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2.6; p = 0.011 Group Score −3.5 −2.5 −1.5 −0.5 0.5 1.5 2.5 3.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −1.4; p = 0.151 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 ** ** ** ** *
  • 38. Points to note • Smaller samples associated with more variable results. • With small sample sizes, true but weak effects will usually not give you a ‘significant’ result (i.e. p < .05). • In the example here, with effect size of .3, sample of 100 per group only gives a significant result on around 60% of runs (when we do many runs of simulation). • This is the same as saying the power of the study to detect an effect size of .3 is equal to .60% • Many statisticians recommend power should be 80% or more (though will depend on purpose of study).
  • 39. Body of table show sample size per group Jacob Cohen worked this all out in 1988
  • 40. Estimating statistical power for your study You can compute power without needing to simulate: For simple designs can use G-power package (or Cohen’s formulae) But simulation gives more insight into what power means. It is also more flexible: can use with complex datasets and analytic methods. Simulate data, run the analysis 10,000 times and then see how frequently your result is ‘significant’ by whatever criterion you plan to use. This requires you to have a sense of what your data will look like, and you have to have an estimate of what is the smallest effect size that you’d be interested in.
  • 41. “Small studies continue to be carried out with little more than a blind hope of showing the desired effect. Nevertheless, papers based on such work are submitted for publication, especially if the results turn out to be statistically significant.” Weak statistical power has been, and continues to be a major cause of problems with replication of findings 1987 Newcombe
  • 42. Low power plagues much research in biomedical science and psychology What can be done?! • Take steps to improve effect size: minimize noise Use better measures – check they are reliable Take more samples of dependent variable – e.g. more trials • Think hard about experimental design – simulate different possibilities E.g. Sometimes a within-subjects design is more sensitive • Work collaboratively to increase sample size
  • 43. Within-subjects vs between-subjects design: Matched pairs vs. independent t-test • See simulation_ex1a_withinsubs.R If some of the noise reflects consistent attribute of subjects, then testing 20 people twice more powerful than testing 2 groups of 20. t = −2.8; p = 0.0115 Difference ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● −2−101234 * t = −4.1; p = 0.000678 Difference ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2−101234 *** t = −1.4; p = 0.167 Difference ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● −2−101234 t = −5.2; p = 0.0000558 Difference ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● −2−101234 *** t = −3.3; p = 0.00337 Difference ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2−101234 ** t = −3.4; p = 0.00296 Difference ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2−101234 ** t = −2.4; p = 0.026 Difference ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2−101234 * t = −2.5; p = 0.0207 Difference ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● −2−101234 * t = −2; p = 0.0564 Difference ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2−101234 t = −2.7; p = 0.0152 Difference ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● −2−101234 * Difference scores pre-post treatment, N = 20: effect size = .5, correlation time1/2 =.5
  • 44. See: DeclareDesignIntro on https://github.com/oscci/simulate_designs Also R package: simstudy – simulate datasets with different properties, including multilevel data
  • 45. Low power plagues much research in biomedical science and psychology What can be done?! • Work collaboratively to increase sample size https://psysciacc.org/ Nature 561, 287 (2018) doi: 10.1038/d41586-018-06692-8
  • 46. Part 2: Simulating null results to illustrate p-hacking
  • 47. P-hacking and type 1 error (false positives) Simulation_ex2_correlations.R Often studies have multiple variables of interest. This script uses the mvrnorm function from the MASS package to simulate multivariate normal data It also demonstrates the dangers of p-hacking by showing how easy it is to get some values with p < .05 if you have a large selection of variables
  • 48. Thought experiment: we’ll simulate 7 uncorrelated variables. In a single run, how likely is it that we’ll see: • No significant correlations • Some significant correlations Suppose you make a specific prediction in advance that your two favourite variables (e.g. V1 and V3) will be significantly correlated: what’s the probability you will be correct?
  • 49. Correlation matrix for run 1 Output from simulation of 7 independent variables, where true correlation = 0 N = 30 Red denotes p < .05 ( r > .31 or < -.31); Sample size not relevant for this demonstration: With larger N, smaller r will be significant at .05
  • 50. Correlation matrix for run 2 Output from simulation of 7 independent variables, where true correlation = 0 N = 30 Red denotes p < .05 ( r > .31 or < -.31); Why do we get significant values when we have specified true r = 0 ?
  • 51. Correlation matrix for run 3 Output from simulation of 7 independent variables, where true correlation = 0 N = 30 Red denotes p < .05 ( r > .31 or < -.31); On any one run, we are looking at 21 correlations. So we should use Bonferroni corrected p-value: .05/21 = .002, corresponds to r = .51
  • 52. • Use of .05 cutoff makes sense only in relation to an a-priori hypothesis Focusing just on ‘significant’ associations in a dataset is classic p- hacking – also known as ‘data dredging’ It is very commonly done, and many people fail to appreciate how misleading it is. It’s fine to look for patterns in complex data as a way of exploring and deriving a hypothesis, but it must then be tested in another sample. Consider: we saw particular patterns in our random noise data – but they did not replicate in another run. Key point: p-values can only be interpreted in terms of the context in which they are computed
  • 53. • Multi-way Anova with many main effects/interactions • Cramer, A. O. J., et al (2016). Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies. Psychonomic Bulletin & Review, 23(2), 640-647. doi:10.3758/s13423-015-0913-5) Other ways in which ‘hidden multiplicity’ of testing can give false positive (p < .05) results
  • 54. Illustrated with field of ERP/EEG • Flexibility in analysis in terms of: • Electrodes • Time intervals • Frequency ranges • Measurement of peaks • etc, etc • Often see analyses with 4- or 5-way ANOVA (group x side x site x condition x interval) • Standard stats packages correct p-values for N levels WITHIN a factor, but not for overall N factors and interactions . Cramer AOJ, et al 2016. Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies. Psychonomic Bulletin & Review 23:640-647
  • 55.
  • 56. • Subgroup analysis Other ways in which ‘hidden multiplicity’ of testing can give false positive (p < .05) results
  • 57. You run a study investigating how a drug, X, affects anxiety. You plot the results by age, and see this: No significant effect of X on anxiety overall -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 16 20 24 28 32 36 40 44 48 52 56 60 Symptomimprovement Age (yr) Treatment effect by age
  • 58. But you notice that there is a treatment effect for those aged over 36 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 16 20 24 28 32 36 40 44 48 52 56 60 Symptomimprovement Age (yr) Treatment effect by age
  • 59. Close link between p-hacking and HARKing You are HARKing if you have no prior predictions, but on seeing results you write up paper as if you planned to look at effect of age on drug effect. This kind of thing is endemic in psychology. • It is OK to say that this association was observed in exploratory analysis, and that it suggests a new hypothesis that needs to be tested in a new sample. • It is NOT OK to pretend that you predicted the association if you didn’t. • And it is REALLY REALLY NOT OK to report only the data that support your new hypothesis (e.g. dropping those aged below 36 from the analysis) -1 -0.5 0 0.5 1 16 20 24 28 32 36 40 44 48 52 56 60 Symptom improvement Age (yr) Treatment effect by age
  • 60. • Analytic flexibility affects not just subgroups, but also selection of measures, type of analysis, removal of outliers, etc. • ‘Garden of forking paths’ • In many cases, hard to apply any statistical correction, because we are unaware of all the potential analyses The problem: analytic flexibility that allows analysis to be Influenced by the results "El jardín de senderos que se bifurcan"
  • 61. 1 contrast Probability of a ‘significant’ p-value < .05 = .05 Large population database used to explore link between ADHD and handedness https://figshare.com/articles/The_Garden_of_Forking_Paths/2100379 Demonstration of rapid expansion of comparisons with binary divisions
  • 62. Focus just on Young subgroup: 2 contrasts at this level Probability of a ‘significant’ p-value < .05 = .10 Large population database used to explore link between ADHD and handedness
  • 63. Focus just on Young on measure of hand skill: 4 contrasts at this level Probability of a ‘significant’ p-value < .05 = .19 Large population database used to explore link between ADHD and handedness
  • 64. Focus just on Young, Females on measure of hand skill: 8 contrasts at this level Probability of a ‘significant’ p-value < .05 = .34 Large population database used to explore link between ADHD and handedness
  • 65. Focus just on Young, Urban, Females on measure of hand skill: 16 contrasts at this level Probability of a ‘significant’ p-value < .05 = .56 Large population database used to explore link between ADHD and handedness
  • 66. Richard Peto: ISIS-2 study group (1988) Lancet 332, 349-410
  • 67. 1956 De Groot Failure to distinguish between hypothesis-testing and hypothesis-generating (exploratory) research -> misuse of statistical tests de Groot, A. D. (2014). The meaning of “significance” for different types of research [translated and annotated by Eric- Jan Wagenmakers, et al]. Acta Psychologica, 148, 188-194. doi:http://dx.doi.org/10.1016/j.actpsy.2014.02.001 Further reading
  • 68. A comprehensive solution: Pre-registration
  • 69. Some general points to help you learn R 1. Basic rule for life, especially in programming: if you don’t know it, Google it In R, Google your error message 2. Best way to learn is by making mistakes If you see a line of code you don’t understand, play with it to find out what it does. Look at Environment tab, or type name of variable on the console to check its value Don’t be afraid to experiment; E.g., you want repeating numbers? Type in the console to compare: rep (1,3) and
  • 70. R scripts available on : https://osf.io/view/reproducibility2017/ • Simulation_ex1_intro.R Suitable for R newbies. Demonstrates ‘dance of the p-values’ in a t-test. Bonus, you learn to make pirate plots • Simulation_ex2_correlations Generate correlation matrices from multivariate normal distribution. Bonus, you learn to use ‘grid’ to make nicely formatted tabular outputs. • Simulation_ex3_multiwayAnova.R Simulate data for a 3-way mixed ANOVA. Demonstrates need to correct for N factors and interactions when doing exploratory multiway Anova. • Simulation_ex4_multipleReg.R Simulate data for multiple regression. • Simulation_ex5_falsediscovery.R Simulate data for mixture of null and true effects, to demonstrate that the probability of the data given the hypothesis is different from the probability of the hypothesis given the data. Two simulations from Daniel Lakens’ Coursera Course – with notes! • 1.1 WhichPvaluesCanYouExpect.R • 3.2 OptionalStoppingSim.R Now even more: See OSF!