7. Z scores
Standardised normal distribution
µ = 0, σ = 1
Z scores: 0, 1, 1.65, 1.96
Need to know population standard
deviation
Z=(x-μ)/σ for
one point
compared to pop.
8. T tests
Comparing means
1 sample t
2 sample t
Paired t
14. T tests in SPM: Did the observed signal
change occur by chance or is it stat.
significant?
Recall GLM. Y= X β + ε
β1 is an estimate of signal change over time
attributable to the condition of interest
Set up contrast (cT) 1 0 for β1: 1xβ1+0xβ2+0xβn/s.d
Null hypothesis: cTβ=0 No significant effect at each
voxel for condition β1
Contrast 1 -1 : Is the difference between 2 conditions
significantly non-zero?
t = cTβ/sd[cTβ] – 1 sided
15.
16. ANOVA
Variances not means
Total variance= model variance + error variance
Results in F score- corresponding to a p value
Variance
n
∑ ( xi − x ) 2 F test = Model variance /Error
variance
s2 = i =1
n −1
17. Partitioning the variance
Group Group Group Group Group Group
1 2 1 2 1 2
Total = Model + Error
(Between groups) (Within groups)
18. T vs F tests
F tests- any differences between
multiple groups, interactions
Have to determine where differences
are post-hoc
SPM- T- one tailed (con)
SPM- F- two tailed (ess)
19.
20. Conclusions
T tests describe how unlikely it is that experimental
differences are due to chance
Higher the t score, smaller the p value, more unlikely
to be due to chance
Can compare sample with population or 2 samples,
paired or unpaired
ANOVA/F tests are similar but use variances instead
of means and can be applied to more than 2 groups
and other more complex scenarios
21. Acknowledgements
MfD slides 2004-2006
Van Belle, Biostatistics
Human Brain Function
Wikipedia
23. Topics Covered:
Is there a relationship between x and y?
What is the strength of this relationship
Pearson’s r
Can we describe this relationship and use it to predict
y from x?
Regression
Is the relationship we have described statistically
significant?
F- and t-tests
Relevance to SPM
GLM
24. Relationship between x and y
Correlation describes the strength and
direction of a linear relationship between two
variables
Regression tells you how well a certain
independent variable predicts a dependent
variable
CORRELATION ≠ CAUSATION
In order to infer causality: manipulate independent
variable and observe effect on dependent variable
25. Scattergrams
Y Y Y
Y Y Y
X X X
Positive correlation Negative correlation No correlation
26. Variance vs. Covariance
Do two variables change together?
n
Variance ~
∑(x i − x) 2
S =
2
x
i =1
DX * DX n
n
Covariance ~ ∑(x i − x)( yi − y )
DX * DY
cov( x, y ) = i =1
n
27. Covariance
n
∑(x i − x)( yi − y )
cov( x, y ) = i =1
n
When X and Y : cov (x,y) = pos.
When X and Y : cov (x,y) = neg.
When no constant relationship: cov (x,y)
=0
28. Example Covariance
7
6 x y xi − x yi − y ( xi − x )( yi − y )
5
0 3 -3 0 0
4
2 2 -1 -1 1
3
2
3 4 0 1 0
1
4 0 1 -3 -3
0
6 6 3 3 9
0 1 2 3 4 5 6 7
x=3 y=3 ∑= 7
n
∑(x i − x)( yi − y ))
7 What does this
cov( x, y ) = i =1
= = 1.4 number tell us?
n 5
29. Example of how covariance value
relies on variance
High variance data Low variance data
Subject x y x error * y x y X error * y
error error
1 101 100 2500 54 53 9
2 81 80 900 53 52 4
3 61 60 100 52 51 1
4 51 50 0 51 50 0
5 41 40 100 50 49 1
6 21 20 900 49 48 4
7 1 0 2500 48 47 9
Mean 51 50 51 50
Sum of x error * y error : 7000 Sum of x error * y error : 28
Covariance: 1166.67 Covariance: 4.67
30. Pearson’s R
− ∞ ≤ cov( x, y ) ≤ ∞
Covariance does not really tell us
anything
Solution: standardise this measure
Pearson’s R: standardise by adding std
to equation: cov( x, y )
rxy =
sx s y
31. Basic assumptions
Normal distributions
Variances are constant and not zero
Independent sampling – no autocorrelations
No errors in the values of the independent
variable
All causation in the model is one-way (not
necessary mathematically, but essential for
prediction)
32. Pearson’s R: degree of linear
dependence
n n
∑ (x i − x)( yi − y ) ∑ ( x − x)( y − y)
i i
cov( x, y ) = i =1
rxy = i =1
n nsx s y
−1 ≤ r ≤ 1
n
∑Z xi * Z yi
rxy = i =1
n
33. Limitations of r
ˆ
r is actually r
r = true r of whole population
ˆ
r = estimate of r based on data
r is very sensitive to extreme values:
5
4
3
2
1
0
0 1 2 3 4 5 6
34. In the real world…
r is never 1 or –1
interpretations for correlations in
psychological research (Cohen)
Correlation Negative Positive
Small -0.29 to -0.10 00.10 to 0.29
Medium -0.49 to -0.30 0.30 to 0.49
Large -1.00 to -0.50 0.50 to 1.00
35. Regression
Correlation tells you if there is an
association between x and y but it
doesn’t describe the relationship or
allow you to predict one variable from
the other.
To do this we need REGRESSION!
36. Best-fit Line
Aim of linear regression is to fit a straight line, ŷ = ax + b, to data
that gives best prediction of y for any value of x
This will be the line that
ŷ = ax + b
minimises distance between slope intercept
data and fitted line, i.e.
the residuals ε
= ŷ, predicted value
= y i , true value
ε = residual error
37. Least Squares Regression
To find the best line we must minimise the
sum of the squares of the residuals (the
vertical distances from the data points to
our line)
Model line: ŷ = ax + b a = slope, b = intercept
Residual (ε) = y - ŷ
Sum of squares of residuals = Σ (y – ŷ)2
we must find values of a and b that minimise
Σ (y – ŷ)2
38. Finding b
First we find the value of b that gives the min
sum of squares
b
ε b ε
b
Trying different values of b is equivalent to
shifting the line up and down the scatter plot
39. Finding a
Now we find the value of a that gives the min
sum of squares
b b b
Trying out different values of a is equivalent to
changing the slope of the line, while b stays
constant
40. Minimising sums of squares
Need to minimise Σ(y–ŷ)2
ŷ = ax + b
so need to minimise:
sums of squares (S)
Σ(y - ax - b)2
If we plot the sums of
squares for all different
values of a and b we get a
parabola, because it is a
squared term Gradient = 0
min S
Values of a and b
So the min sum of squares is
at the bottom of the curve,
where the gradient is zero.
41. The maths bit
So we can find a and b that give min sum of
squares by taking partial derivatives of Σ(y -
ax - b)2 with respect to a and b separately
Then we solve these for 0 to give us the
values of a and b that give the min sum of
squares
42. The solution
Doing this gives the following equations for a and b:
r sy r = correlation coefficient of x and y
a= s sy = standard deviation of y
x sx = standard deviation of x
You can see that:
A low correlation coefficient gives a flatter slope (small
value of a)
Large spread of y, i.e. high standard deviation, results in a
steeper slope (high value of a)
Large spread of x, i.e. high standard deviation, results in a
flatter slope (high value of a)
43. The solution cont.
Our model equation is ŷ = ax + b
This line must pass through the mean so:
y = ax + b b = y – ax
We can put our equation into this giving:
r sy r = correlation coefficient of x and y
b=y- x sy = standard deviation of y
sx sx = standard deviation of x
The smaller the correlation, the closer the intercept is to the mean
of y
44. Back to the model
We can calculate the regression line for any
data, but the important question is:
How well does this line fit the data, or how
good is it at predicting y from x?
45. How good is our model?
∑(y – y)2 SSy
Total variance of y: sy2 = =
n-1 dfy
Variance of predicted y values (ŷ):
∑(ŷ – y)2 SSpred This is the variance
sŷ2 = = explained by our
n-1 dfŷ regression model
Error variance: This is the variance of the error
between our predicted y values
∑(y – ŷ)2 SSer and the actual y values, and
serror =
2
= thus is the variance in y that is
n-2 dfer NOT explained by the
regression model
46. How good is our model cont.
Total variance = predicted variance + error variance
sy2 = sŷ2 + ser2
Conveniently, via some complicated rearranging
sŷ2 = r2 sy2
r2 = sŷ2 / sy2
so r2 is the proportion of the variance in y that is explained
by our regression model
47. How good is our model cont.
Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get:
ser2 = sy2 – r2sy2
= sy2 (1 – r2)
From this we can see that the greater the
correlation the smaller the error variance, so the
better our prediction
48. Is the model significant?
i.e. do we get a significantly better prediction
of y from our regression equation than by just
predicting the mean?
F-statistic: complicated
rearranging
sŷ2 r2 (n - 2)2
F(df ,df ) = =......=
ŷ er
ser2 1 – r2
And it follows that:
r (n - 2) So all we need to
(because F = t 2) t(n-2) = know are r and n !
√1 – r2
49. General Linear Model
Linear regression is actually a form of
the General Linear Model where the
parameters are a, the slope of the line,
and b, the intercept.
y = ax + b +ε
A General Linear Model is just any
model that describes the data in terms
of a straight line
50. Multiple regression
Multiple regression is used to determine the effect of a
number of independent variables, x1, x2, x3 etc., on a
single dependent variable, y
The different x variables are combined in a linear way
and each has its own regression coefficient:
y = a1x1+ a2x2 +…..+ anxn + b + ε
The a parameters reflect the independent contribution of
each independent variable, x, to the value of the
dependent variable, y.
i.e. the amount of variance in y that is accounted for by
each x variable after all the other x variables have been
accounted for
51. SPM
Linear regression is a GLM that models the effect of one
independent variable, x, on ONE dependent variable, y
Multiple Regression models the effect of several independent
variables, x1, x2 etc, on ONE dependent variable, y
Both are types of General Linear Model
GLM can also allow you to analyse the effects of several
independent x variables on several dependent variables, y1, y2, y3
etc, in a linear combination
This is what SPM does and will be explained soon…
Notes de l'éditeur
We often want to know whether various variables are ‘linked’, i.e., correlated. This can be interesting in itself, but is also important if we want to predict one variable’s value given a value of the other.
means that correlation cannot be validly used to infer a causal relationship between the variables not be taken to mean that correlations cannot indicate causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown A correlation between age and height in children is fairly causally transparent, but a correlation between mood and health in people is less so. Does improved mood lead to improved health? Or does good health lead to good mood? Or does some other factor underlie both? Or is it pure coincidence? In other words, a correlation can be taken as evidence for a possible causal relationship, but cannot indicate what the causal relationship, if any, might be.
Variance is just a definition. The reason we’re squaring it is to have it get a positive value whether dx is negative or positive, so that we can sum them and positives and negatives will not cancel out. Variance is spread around a mean, covariance is the measure of how much x and y change together; very similar: multiply 2 variables rather than square 1
Problem with Covariance: The value obtained by covariance is dependent on the size of the data’s standard deviations: if large, the value will be greater than if small… even if the relationship between x and y is exactly the same in the large versus small standard deviation datasets.
Can only compare covariances between different variables to see which is greater.
The distance of r from 0 indicates strength of correlation r = 1 or r = (-1) means that we can predict y from x and vice versa with certainty; all data points are on a straight line. i.e., y = ax + b The correlation is 1 in the case of an increasing linear relationship, −1 in the case of a decreasing linear relationship, and some value in between in all other cases, indicating the degree of linear dependence between the variables. The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables. the correlation coefficient detects only linear dependencies between two variables
X av = 3, y av = 2 If extreme value such as y = 5 is possible (graph) X – 1,2,3,4,5, Y – 1,2,3,4,0
the interpretation of a correlation coefficient depends on the context and purposes. A correlation of 0.9 may be very low if one is verifying a physical law using high-quality instruments, but may be regarded as very high in the social sciences where there may be a greater contribution from complicating factors.
So to understand the relationship between two variables, we want to draw the ‘best’ line through the cloud – find the best fit . This is done using the principle of least squares