SlideShare une entreprise Scribd logo
1  sur  59
CORRELATION AND REGRESSION
11/14/2022 1
NITT / CA
Topics Covered:
• Is there a relationship between x and y?
• What is the strength of this relationship
– Pearson’s correlation coefficient r.
– Rank correlation coefficient
• Spearman's rank
• Kendall tau rank
• Goodman and Kruskal's gamma
• Can we describe this relationship and use this to predict y from x?
– Regression
• Is the relationship we have described statistically significant?
– t test
11/14/2022 2
NITT / CA
The relationship between x and y
• Correlation: Is there a relationship between 2
variables?
• Regression: How well a certain independent
variable predict dependent variable?
• CORRELATION  CAUSATION
– In order to infer causality: manipulate independent
variable and observe effect on dependent variable
11/14/2022 3
NITT / CA
Analyse
Scattergrams
Y
X
Y
X
Y
X
Y
Y Y
Positive correlation Negative correlation No correlation
11/14/2022 5
NITT / CA
Variance vs Covariance
• Note on your sample:
• If you’re wishing to assume that your sample is
representative of the general population (RANDOM
EFFECTS MODEL), use the degrees of freedom (n – 1)
in your calculations of variance or covariance.
• But if you’re simply wanting to assess your current
sample (FIXED EFFECTS MODEL), substitute n for
the degrees of freedom.
11/14/2022 6
NITT / CA
Variance vs Covariance
• Do two variables change together?
1
)
)(
(
)
,
cov( 1






n
y
y
x
x
y
x
i
n
i
i
Covariance:
• Gives information on the degree to which
two variables vary together.
• Note how similar the covariance is to
variance: the equation simply multiplies x’s
error scores by y’s error scores as opposed
to squaring x’s error scores.
1
)
( 2
1
2





n
x
x
S
n
i
i
x
Variance:
• Gives information on variability of a
single variable.
11/14/2022 7
NITT / CA
Covariance
 When X and Y : cov (x,y) = pos.
 When X and Y : cov (x,y) = neg.
 When no constant relationship: cov (x,y) = 0
1
)
)(
(
)
,
cov( 1






n
y
y
x
x
y
x
i
n
i
i
11/14/2022 8
NITT / CA
Example Covariance
x y x
xi
 y
yi
 ( x
i
x  )( y
i
y  )
0 3 -3 0 0
2 2 -1 -1 1
3 4 0 1 0
4 0 1 -3 -3
6 6 3 3 9
3

x 3

y  7
75
.
1
4
7
1
))
)(
(
)
,
cov( 1








n
y
y
x
x
y
x
i
n
i
i What does this
number tell us?
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7
11/14/2022 9
NITT / CA
Problem with Covariance:
• The value obtained by covariance is dependent on the size of
the data’s standard deviations: if large, the value will be
greater than if small… even if the relationship between x and y
is exactly the same in the large versus small standard
deviation datasets.
11/14/2022 10
NITT / CA
Example of how covariance value
relies on variance
High variance data Low variance data
Subject x y x error * y
error
x y X error * y
error
1 101 100 2500 54 53 9
2 81 80 900 53 52 4
3 61 60 100 52 51 1
4 51 50 0 51 50 0
5 41 40 100 50 49 1
6 21 20 900 49 48 4
7 1 0 2500 48 47 9
Mean 51 50 51 50
Sum of x error * y error : 7000 Sum of x error * y error : 28
Covariance: 1166.67 Covariance: 4.67
11/14/2022 11
NITT / CA
Solution: Pearson’s r
 Covariance does not really tell us anything
Solution: standardise this measure
 Pearson’s R: standardises the covariance value.
 Divides the covariance by the multiplied standard deviations of X
and Y:
y
x
xy
s
s
y
x
r
)
,
cov(

11/14/2022 12
NITT / CA
Pearson’s R continued
1
)
)(
(
)
,
cov( 1






n
y
y
x
x
y
x
i
n
i
i
y
x
i
n
i
i
xy
s
s
n
y
y
x
x
r
)
1
(
)
)(
(
1






1
*
1




n
Z
Z
r
n
i
y
x
xy
i
i
11/14/2022 13
NITT / CA
Limitations of r
• When r = 1 or r = -1:
– We can predict y from x with certainty
– All data points are on a straight line: y = ax + b
• r is actually
– r = true r of whole population
– = estimate of r based on data
• r is very sensitive to extreme values:
0
1
2
3
4
5
0 1 2 3 4 5 6
r̂
11/14/2022 14
NITT / CA
The Pearson's correlation coefficient:
• Varies between -1 and +1
• r = 1 means the data is perfectly linear with a
positive slope (i.e., both variables tend to change in the same direction)
• r = -1 means the data is perfectly linear with a
negative slope ( i.e., both variables tend to change in different directions)
• r = 0 means there is no linear association
• 0 < r < 0.4 means there is a weak association
• 0.4 < r < 0.8 means there is a moderate association
• r > 0.8 means there is a strong association
Pearson’s correlation coefficient of x and
y for various set.
correlation categories
Pearson Correlation Coefficient Formula r:
11/14/2022 NITT / CA 18
r = Pearson Coefficient.
x = First variable
y = Second variable
n= number of the pairs of the data values
∑xy = sum of products of the paired data values
∑x = sum of the x scores
∑y= sum of the y scores
∑x2 = sum of the squared x scores
∑y2 = sum of the squared y scores
Example:
• Calculate of the value of the Pearson’s correlation
coefficient r with the help of the following details for
the six people who have different ages and weights.
11/14/2022 NITT / CA 19
S. No Age (x) Weight (y)
1 40 78
2 21 70
3 25 60
4 31 55
5 38 80
6 47 66
Calculate the following values (xy, x2,and y2)
11/14/2022 NITT / CA 20
Calculation of the Pearson’s r
Example:
• There are 2 stocks
– A and B. Their
share prices on
particular days are
as follows:
Stock A (x) Stock B (y)
45 9
50 8
53 8
58 7
60 5
11/14/2022 NITT / CA 22
Calculate the following values (xy, x2,and y2)
11/14/2022 NITT / CA 23
r = (5*1935-266*37)/((5*14298-(266)^2)*(5*283-(37)^2))^0.5
r = -0.9088
Example
11/14/2022 NITT / CA 24
11/14/2022 NITT / CA 25
Advantages
• It helps to know, the stengthness of the
relationship between the two variables.
• It also determines the exact extent to which
those variables are correlated.
• Using this method, one can ascertain the
direction of correlation,
– negative or positive.
• It is independent of the unit of measurement
of the variables (Eg: cm and inch).
11/14/2022 NITT / CA 26
Disadvantages
• The Pearson correlation coefficient r is insufficient to
tell the difference between the dependent and
independent variables (r is symmetric for 2 variables).
• We cannot get information about the slope of the line
as it only states whether any relationship between the
two variables exists or not.
• The Pearson correlation coefficient may likely be
misinterpreted, especially in the case of
homogeneous data.
• Compared with the other calculation methods, this
method takes more time to arrive at the results.
11/14/2022 NITT / CA 27
Spearman correlation coefficient: ρ(rho)
11/14/2022 NITT / CA 28
Here,
n= number of data points of the two variables
di= difference in ranks of the “ith” element
The Spearman Coefficient, ⍴, can take a value between +1 to -1 where,
A ⍴ value of +1 means a perfect association of rank
A ⍴ value of 0 means no association of ranks
A ⍴ value of -1 means a perfect negative association between ranks.
Example:
Maths Science
35 30
23 33
47 45
17 23
10 8
43 49
9 12
6 4
28 31
Calculate the Spearman correlation coefficient for the following data
Calculation
Maths Science Rank_M Rank_S d = diff d2
35 30 3 5 2 4
23 33 5 3 2 4
47 45 1 2 1 1
17 23 6 6 0 0
10 8 7 8 1 1
43 49 2 1 1 1
9 12 8 7 1 1
6 4 9 9 0 0
28 31 4 4 0 0
12
Here n = 9, sum of square of d = 12
⍴ = 1-(6*12)/(9(81-1))
= 1-72/720
= 1-01
= 0.9
Example:Calculate Spearman's correlation
coefficient.
11/14/2022 NITT / CA 32
n = 10
Example: Calculate Spearman's rank coefficient
11/14/2022 NITT / CA 33
Example: Calculate Spearman's rank
coefficient
11/14/2022 NITT / CA 34
Suppose we have ranks of 8 students in Statistics and Mathematics. On the basis of rank we
would like to know that to what extent the knowledge of the student in Statistics and
Mathematics is related
Rank in Stat 1 2 3 4 5 6 7 8
Rank in Math 2 4 1 5 3 8 7 6
Rank in Stat
Rank in
Math
Difference of
Ranks = d
d2
1 2 -1 1
2 4 -2 4
3 1 2 4
4 5 -1 1
5 3 2 4
6 8 -2 4
7 7 0 0
8 6 2 4
22
Here,
n = number of paired observations = 8
REGRESSION
11/14/2022 NITT / CA 35
Regression
• Correlation tells you if there is an association
between x and y but it doesn’t allow you to
predict one variable from the other.
• To do this we need REGRESSION!
11/14/2022 36
NITT / CA
Best-fit Line
= ŷ, predicted value
• Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that
gives best prediction of y for any value of x
• This will be the line that
minimises distance between
data and fitted line, i.e.
the residuals
intercept
ε
ŷ = ax + b
ε = residual error
= y i , true value
slope
11/14/2022 37
NITT / CA
Least Squares Regression
 To find the best line we must minimise the sum of
the squares of the residuals (the vertical distances
from the data points to our line)
Residual (ε) = y - ŷ
Sum of squares of residuals = Σ (y – ŷ)2
Model line: ŷ = ax + b
 We must find values of a and b that minimise
Σ (y – ŷ)2
a = slope, b = intercept
11/14/2022 38
NITT / CA
Finding b
• First we find the value of b that gives the min
sum of squares
ε ε
b
b
b
 Trying different values of b is equivalent to
shifting the line up and down the scatter plot
11/14/2022 39
NITT / CA
Finding a
• Now we find the value of a that gives the min
sum of squares
b b b
 Trying out different values of a is equivalent to
changing the slope of the line, while b stays
constant
11/14/2022 40
NITT / CA
Minimising sums of squares
 Need to minimise Σ(y–ŷ)2
 But ŷ = ax + b
 So Wee need to minimise:
Σ(y - ax - b)2
 If we plot the sums of squares
for all different values of a and b
we get a parabola, because it is a
squared term
 So the min sum of squares is at
the bottom of the curve, where
the gradient is zero.
Values of a and b
sums
of
squares
(S)
Gradient = 0
min S
11/14/2022 41
NITT / CA
11/14/2022 NITT / CA 42
Recall that the expression for the correlation coefficient is
 
 
  










2
2
2
2
2
2
)
(
)
(
)
(
)
(
)
)(
(
y
y
n
x
x
n
y
x
xy
n
r
y
y
x
x
y
y
x
x
r
Regression Line
• A simple linear relationship between two
variables.
• An estimated regression line based on sample
data as
• Least squares method give us the “best”
estimated line. It chooses the best values for b0,
and b1 to minimize the sum of squared errors
x
y 1
0 
 

x
b
b
y 1
0
ˆ 

 
2
1
1
0
1
2
)
ˆ
( 
 






n
i
n
i
i
i x
b
b
y
y
y
SSE
 
  


 
  








 n
i
n
i
i
i
n
i
n
i
n
i
i
i
i
i
n
i
i
n
i
i
i
x
x
n
y
x
y
x
n
x
x
y
y
x
x
b
1 1
2
2
1 1 1
1
2
1
1
)
(
)
(
)
)(
(
or
x
y
S
S
r
b 
1
x
b
y
b 1
0 

Example
• The weekly advertising expenditure
(x) and weekly sales (y) are presented
in the following table
• Use fitted regression line to estimate
the mean value of y for a given value
of x=50.
11/14/2022 NITT / CA 44
S. no. x y xy x2
1 41 1250 51250 1681
2 54 1380 74520 2916
3 63 1425 89775 3969
4 54 1425 76950 2916
5 48 1450 69600 2304
6 46 1300 59800 2116
7 62 1400 86800 3844
8 61 1510 92110 3721
9 64 1575 100800 4096
10 71 1650 117150 5041
Total 564 14365 818755 32604
• From previous table we have:
• The least squares estimates of the regression coefficients are:
• The estimated regression function is:
• If the advertising expenditure is Rs 50, then the estimated Sales is:
11/14/2022 NITT / CA 45
 
 





818755
14365
32604
564
10 2
xy
y
x
x
n
8
.
10
)
564
(
)
32604
(
10
)
14365
)(
564
(
)
818755
(
10
)
( 2
2
2
1 






 
  
x
x
n
y
x
xy
n
b
828
)
4
.
56
(
8
.
10
5
.
1436
0 


b
e
Expenditur
8
.
10
828
Sales
10.8x
828
ŷ




1368
)
50
(
8
.
10
828 


Sales
Residual
• The difference between the observed value yi and
the corresponding fitted value .,
• Residuals are highly useful for studying whether a
given regression model is appropriate for the data
at hand.
11/14/2022 NITT / CA 46
i
ŷ i
i
i y
y
e ˆ


y x y-hat Residual (e)
1250 41 1270.8 -20.8
1380 54 1411.2 -31.2
1425 63 1508.4 -83.4
1425 54 1411.2 13.8
1450 48 1346.4 103.6
1300 46 1324.8 -24.8
1400 62 1497.6 -97.6
1510 61 1486.8 23.2
1575 64 1519.2 55.8
1650 71 1594.8 55.2
Regression Standard Error
• For simple linear regression the estimate of 2
is the average squared residual.
• To estimate  , use
• This estimates the standard deviation  of the
error term  in the statistical model for simple
linear regression.
11/14/2022 NITT / CA 47

 



 2
2
2
. )
ˆ
(
2
1
2
1
i
i
i
x
y y
y
n
e
n
s
2
.
. x
y
x
y s
s 
Regression Standard Error
11/14/2022 NITT / CA 48
y x y-hat Residual (e) square(e)
1250 41 1270.8 -20.8 432.64
1380 54 1411.2 -31.2 973.44
1425 63 1508.4 -83.4 6955.56
1425 54 1411.2 13.8 190.44
1450 48 1346.4 103.6 10732.96
1300 46 1324.8 -24.8 615.04
1400 62 1497.6 -97.6 9525.76
1510 61 1486.8 23.2 538.24
1575 64 1519.2 55.8 3113.64
1650 71 1594.8 55.2 3047.04
y-hat = 828+10.8X total 36124.76
Sy .x 67.19818
The maths bit
• The min sum of squares is at the bottom of the curve
where the gradient = 0
• So we can find a and b that give min sum of squares
by taking partial derivatives of Σ(y - ax - b)2 with
respect to a and b separately
• Then we solve these for 0 to give us the values of a
and b that give the min sum of squares
11/14/2022 49
NITT / CA
The solution
• Doing this gives the following equations for a and b:
a =
r sy
sx
r = correlation coefficient of x and y
sy = standard deviation of y
sx = standard deviation of x
 From you can see that:
 A low correlation coefficient gives a flatter slope (small value of a)
 Large spread of y, i.e. high standard deviation, results in a steeper
slope (high value of a)
 Large spread of x, i.e. high standard deviation, results in a flatter
slope (high value of a)
11/14/2022 50
NITT / CA
The solution cont.
• Our model equation is ŷ = ax + b
• This line must pass through the mean so:
y = ax + b b = y – ax
 We can put our equation for a into this giving:
b = y – ax
b = y -
r sy
sx
r = correlation coefficient of x and y
sy = standard deviation of y
sx = standard deviation of x
x
 The smaller the correlation, the closer the
intercept is to the mean of y
11/14/2022 51
NITT / CA
Back to the model
 If the correlation is zero, we will simply predict the mean of y for every
value of x, and our regression line is just a flat straight line crossing the
x-axis at y
 But this isn’t very useful.
 We can calculate the regression line for any data, but the important
question is how well does this line fit the data, or how good is it at
predicting y from x
ŷ = ax + b =
r sy
sx
r sy
sx
x + y - x
r sy
sx
ŷ = (x – x) + y
Rearranges to:
a b
a a
11/14/2022 52
NITT / CA
How good is our model?
• Total variance of y: sy
2 =
∑(y – y)2
n - 1
SSy
dfy
=
 Variance of predicted y values (ŷ):
 Error variance:
sŷ
2 =
∑(ŷ – y)2
n - 1
SSpred
dfŷ
=
This is the variance
explained by our
regression model
serror
2 =
∑(y – ŷ)2
n - 2
SSer
dfer
=
This is the variance of the error
between our predicted y values and
the actual y values, and thus is the
variance in y that is NOT explained
by the regression model
11/14/2022 53
NITT / CA
• Total variance = predicted variance + error variance
sy
2 = sŷ
2 + ser
2
• Conveniently, via some complicated rearranging
sŷ
2 = r2 sy
2
r2 = sŷ
2 / sy
2
• so r2 is the proportion of the variance in y that is explained by
our regression model
How good is our model cont.
11/14/2022 54
NITT / CA
How good is our model cont.
• Insert r2 sy
2 into sy
2 = sŷ
2 + ser
2 and rearrange to get:
ser
2 = sy
2 – r2sy
2
= sy
2 (1 – r2)
• From this we can see that the greater the correlation
the smaller the error variance, so the better our
prediction
11/14/2022 55
NITT / CA
Is the model significant?
• i.e. do we get a significantly better prediction of y
from our regression equation than by just predicting
the mean?
• F-statistic:
F(dfŷ,dfer) =
sŷ
2
ser
2
=......=
r2 (n - 2)2
1 – r2
complicated
rearranging
 And it follows that:
t(n-2) =
r (n - 2)
√1 – r2
(because F = t2)
So all we need to
know are r and n
11/14/2022 56
NITT / CA
General Linear Model
• Linear regression is actually a form of the
General Linear Model where the parameters
are a, the slope of the line, and b, the intercept.
y = ax + b +ε
• A General Linear Model is just any model that
describes the data in terms of a straight line
11/14/2022 57
NITT / CA
Multiple regression
• Multiple regression is used to determine the effect of a number
of independent variables, x1, x2, x3 etc, on a single dependent
variable, y
• The different x variables are combined in a linear way and
each has its own regression coefficient:
y = a1x1+ a2x2 +…..+ anxn + b + ε
• The a parameters reflect the independent contribution of each
independent variable, x, to the value of the dependent variable,
y.
• i.e. the amount of variance in y that is accounted for by each x
variable after all the other x variables have been accounted for
11/14/2022 58
NITT / CA
Thank you

Contenu connexe

Similaire à Unit 4_3 Correlation Regression.pptx

Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Sciencessuser71ac73
 
Correlation new 2017 black
Correlation new 2017 blackCorrelation new 2017 black
Correlation new 2017 blackfizjadoon
 
Nonparametric approach to multiple regression
Nonparametric approach to multiple regressionNonparametric approach to multiple regression
Nonparametric approach to multiple regressionAlexander Decker
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relationnuwan udugampala
 
Regression and correlation in statistics
Regression and correlation in statisticsRegression and correlation in statistics
Regression and correlation in statisticsiphone4s4
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxkrunal soni
 
correlation.final.ppt (1).pptx
correlation.final.ppt (1).pptxcorrelation.final.ppt (1).pptx
correlation.final.ppt (1).pptxChieWoo1
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptxnikshaikh786
 
correlation-analysis.pptx
correlation-analysis.pptxcorrelation-analysis.pptx
correlation-analysis.pptxSoujanyaLk1
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstonsandymartin
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regressionUnsa Shakir
 

Similaire à Unit 4_3 Correlation Regression.pptx (20)

Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
Correlation continued
Correlation continuedCorrelation continued
Correlation continued
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation new 2017 black
Correlation new 2017 blackCorrelation new 2017 black
Correlation new 2017 black
 
Nonparametric approach to multiple regression
Nonparametric approach to multiple regressionNonparametric approach to multiple regression
Nonparametric approach to multiple regression
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Regression Analysis.pdf
Regression Analysis.pdfRegression Analysis.pdf
Regression Analysis.pdf
 
Regression
Regression  Regression
Regression
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relation
 
Determinants
DeterminantsDeterminants
Determinants
 
Regression and correlation in statistics
Regression and correlation in statisticsRegression and correlation in statistics
Regression and correlation in statistics
 
regression
regressionregression
regression
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
 
correlation.final.ppt (1).pptx
correlation.final.ppt (1).pptxcorrelation.final.ppt (1).pptx
correlation.final.ppt (1).pptx
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptx
 
correlation-analysis.pptx
correlation-analysis.pptxcorrelation-analysis.pptx
correlation-analysis.pptx
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstons
 
stats_ch12.pdf
stats_ch12.pdfstats_ch12.pdf
stats_ch12.pdf
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regression
 

Dernier

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Dernier (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

Unit 4_3 Correlation Regression.pptx

  • 2. Topics Covered: • Is there a relationship between x and y? • What is the strength of this relationship – Pearson’s correlation coefficient r. – Rank correlation coefficient • Spearman's rank • Kendall tau rank • Goodman and Kruskal's gamma • Can we describe this relationship and use this to predict y from x? – Regression • Is the relationship we have described statistically significant? – t test 11/14/2022 2 NITT / CA
  • 3. The relationship between x and y • Correlation: Is there a relationship between 2 variables? • Regression: How well a certain independent variable predict dependent variable? • CORRELATION  CAUSATION – In order to infer causality: manipulate independent variable and observe effect on dependent variable 11/14/2022 3 NITT / CA
  • 5. Scattergrams Y X Y X Y X Y Y Y Positive correlation Negative correlation No correlation 11/14/2022 5 NITT / CA
  • 6. Variance vs Covariance • Note on your sample: • If you’re wishing to assume that your sample is representative of the general population (RANDOM EFFECTS MODEL), use the degrees of freedom (n – 1) in your calculations of variance or covariance. • But if you’re simply wanting to assess your current sample (FIXED EFFECTS MODEL), substitute n for the degrees of freedom. 11/14/2022 6 NITT / CA
  • 7. Variance vs Covariance • Do two variables change together? 1 ) )( ( ) , cov( 1       n y y x x y x i n i i Covariance: • Gives information on the degree to which two variables vary together. • Note how similar the covariance is to variance: the equation simply multiplies x’s error scores by y’s error scores as opposed to squaring x’s error scores. 1 ) ( 2 1 2      n x x S n i i x Variance: • Gives information on variability of a single variable. 11/14/2022 7 NITT / CA
  • 8. Covariance  When X and Y : cov (x,y) = pos.  When X and Y : cov (x,y) = neg.  When no constant relationship: cov (x,y) = 0 1 ) )( ( ) , cov( 1       n y y x x y x i n i i 11/14/2022 8 NITT / CA
  • 9. Example Covariance x y x xi  y yi  ( x i x  )( y i y  ) 0 3 -3 0 0 2 2 -1 -1 1 3 4 0 1 0 4 0 1 -3 -3 6 6 3 3 9 3  x 3  y  7 75 . 1 4 7 1 )) )( ( ) , cov( 1         n y y x x y x i n i i What does this number tell us? 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 11/14/2022 9 NITT / CA
  • 10. Problem with Covariance: • The value obtained by covariance is dependent on the size of the data’s standard deviations: if large, the value will be greater than if small… even if the relationship between x and y is exactly the same in the large versus small standard deviation datasets. 11/14/2022 10 NITT / CA
  • 11. Example of how covariance value relies on variance High variance data Low variance data Subject x y x error * y error x y X error * y error 1 101 100 2500 54 53 9 2 81 80 900 53 52 4 3 61 60 100 52 51 1 4 51 50 0 51 50 0 5 41 40 100 50 49 1 6 21 20 900 49 48 4 7 1 0 2500 48 47 9 Mean 51 50 51 50 Sum of x error * y error : 7000 Sum of x error * y error : 28 Covariance: 1166.67 Covariance: 4.67 11/14/2022 11 NITT / CA
  • 12. Solution: Pearson’s r  Covariance does not really tell us anything Solution: standardise this measure  Pearson’s R: standardises the covariance value.  Divides the covariance by the multiplied standard deviations of X and Y: y x xy s s y x r ) , cov(  11/14/2022 12 NITT / CA
  • 13. Pearson’s R continued 1 ) )( ( ) , cov( 1       n y y x x y x i n i i y x i n i i xy s s n y y x x r ) 1 ( ) )( ( 1       1 * 1     n Z Z r n i y x xy i i 11/14/2022 13 NITT / CA
  • 14. Limitations of r • When r = 1 or r = -1: – We can predict y from x with certainty – All data points are on a straight line: y = ax + b • r is actually – r = true r of whole population – = estimate of r based on data • r is very sensitive to extreme values: 0 1 2 3 4 5 0 1 2 3 4 5 6 r̂ 11/14/2022 14 NITT / CA
  • 15. The Pearson's correlation coefficient: • Varies between -1 and +1 • r = 1 means the data is perfectly linear with a positive slope (i.e., both variables tend to change in the same direction) • r = -1 means the data is perfectly linear with a negative slope ( i.e., both variables tend to change in different directions) • r = 0 means there is no linear association • 0 < r < 0.4 means there is a weak association • 0.4 < r < 0.8 means there is a moderate association • r > 0.8 means there is a strong association
  • 16. Pearson’s correlation coefficient of x and y for various set.
  • 18. Pearson Correlation Coefficient Formula r: 11/14/2022 NITT / CA 18 r = Pearson Coefficient. x = First variable y = Second variable n= number of the pairs of the data values ∑xy = sum of products of the paired data values ∑x = sum of the x scores ∑y= sum of the y scores ∑x2 = sum of the squared x scores ∑y2 = sum of the squared y scores
  • 19. Example: • Calculate of the value of the Pearson’s correlation coefficient r with the help of the following details for the six people who have different ages and weights. 11/14/2022 NITT / CA 19 S. No Age (x) Weight (y) 1 40 78 2 21 70 3 25 60 4 31 55 5 38 80 6 47 66
  • 20. Calculate the following values (xy, x2,and y2) 11/14/2022 NITT / CA 20
  • 21. Calculation of the Pearson’s r
  • 22. Example: • There are 2 stocks – A and B. Their share prices on particular days are as follows: Stock A (x) Stock B (y) 45 9 50 8 53 8 58 7 60 5 11/14/2022 NITT / CA 22
  • 23. Calculate the following values (xy, x2,and y2) 11/14/2022 NITT / CA 23 r = (5*1935-266*37)/((5*14298-(266)^2)*(5*283-(37)^2))^0.5 r = -0.9088
  • 26. Advantages • It helps to know, the stengthness of the relationship between the two variables. • It also determines the exact extent to which those variables are correlated. • Using this method, one can ascertain the direction of correlation, – negative or positive. • It is independent of the unit of measurement of the variables (Eg: cm and inch). 11/14/2022 NITT / CA 26
  • 27. Disadvantages • The Pearson correlation coefficient r is insufficient to tell the difference between the dependent and independent variables (r is symmetric for 2 variables). • We cannot get information about the slope of the line as it only states whether any relationship between the two variables exists or not. • The Pearson correlation coefficient may likely be misinterpreted, especially in the case of homogeneous data. • Compared with the other calculation methods, this method takes more time to arrive at the results. 11/14/2022 NITT / CA 27
  • 28. Spearman correlation coefficient: ρ(rho) 11/14/2022 NITT / CA 28 Here, n= number of data points of the two variables di= difference in ranks of the “ith” element The Spearman Coefficient, ⍴, can take a value between +1 to -1 where, A ⍴ value of +1 means a perfect association of rank A ⍴ value of 0 means no association of ranks A ⍴ value of -1 means a perfect negative association between ranks.
  • 29. Example: Maths Science 35 30 23 33 47 45 17 23 10 8 43 49 9 12 6 4 28 31 Calculate the Spearman correlation coefficient for the following data
  • 30. Calculation Maths Science Rank_M Rank_S d = diff d2 35 30 3 5 2 4 23 33 5 3 2 4 47 45 1 2 1 1 17 23 6 6 0 0 10 8 7 8 1 1 43 49 2 1 1 1 9 12 8 7 1 1 6 4 9 9 0 0 28 31 4 4 0 0 12
  • 31. Here n = 9, sum of square of d = 12 ⍴ = 1-(6*12)/(9(81-1)) = 1-72/720 = 1-01 = 0.9
  • 33. Example: Calculate Spearman's rank coefficient 11/14/2022 NITT / CA 33
  • 34. Example: Calculate Spearman's rank coefficient 11/14/2022 NITT / CA 34 Suppose we have ranks of 8 students in Statistics and Mathematics. On the basis of rank we would like to know that to what extent the knowledge of the student in Statistics and Mathematics is related Rank in Stat 1 2 3 4 5 6 7 8 Rank in Math 2 4 1 5 3 8 7 6 Rank in Stat Rank in Math Difference of Ranks = d d2 1 2 -1 1 2 4 -2 4 3 1 2 4 4 5 -1 1 5 3 2 4 6 8 -2 4 7 7 0 0 8 6 2 4 22 Here, n = number of paired observations = 8
  • 36. Regression • Correlation tells you if there is an association between x and y but it doesn’t allow you to predict one variable from the other. • To do this we need REGRESSION! 11/14/2022 36 NITT / CA
  • 37. Best-fit Line = ŷ, predicted value • Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that gives best prediction of y for any value of x • This will be the line that minimises distance between data and fitted line, i.e. the residuals intercept ε ŷ = ax + b ε = residual error = y i , true value slope 11/14/2022 37 NITT / CA
  • 38. Least Squares Regression  To find the best line we must minimise the sum of the squares of the residuals (the vertical distances from the data points to our line) Residual (ε) = y - ŷ Sum of squares of residuals = Σ (y – ŷ)2 Model line: ŷ = ax + b  We must find values of a and b that minimise Σ (y – ŷ)2 a = slope, b = intercept 11/14/2022 38 NITT / CA
  • 39. Finding b • First we find the value of b that gives the min sum of squares ε ε b b b  Trying different values of b is equivalent to shifting the line up and down the scatter plot 11/14/2022 39 NITT / CA
  • 40. Finding a • Now we find the value of a that gives the min sum of squares b b b  Trying out different values of a is equivalent to changing the slope of the line, while b stays constant 11/14/2022 40 NITT / CA
  • 41. Minimising sums of squares  Need to minimise Σ(y–ŷ)2  But ŷ = ax + b  So Wee need to minimise: Σ(y - ax - b)2  If we plot the sums of squares for all different values of a and b we get a parabola, because it is a squared term  So the min sum of squares is at the bottom of the curve, where the gradient is zero. Values of a and b sums of squares (S) Gradient = 0 min S 11/14/2022 41 NITT / CA
  • 42. 11/14/2022 NITT / CA 42 Recall that the expression for the correlation coefficient is                  2 2 2 2 2 2 ) ( ) ( ) ( ) ( ) )( ( y y n x x n y x xy n r y y x x y y x x r
  • 43. Regression Line • A simple linear relationship between two variables. • An estimated regression line based on sample data as • Least squares method give us the “best” estimated line. It chooses the best values for b0, and b1 to minimize the sum of squared errors x y 1 0     x b b y 1 0 ˆ     2 1 1 0 1 2 ) ˆ (          n i n i i i x b b y y y SSE                      n i n i i i n i n i n i i i i i n i i n i i i x x n y x y x n x x y y x x b 1 1 2 2 1 1 1 1 2 1 1 ) ( ) ( ) )( ( or x y S S r b  1 x b y b 1 0  
  • 44. Example • The weekly advertising expenditure (x) and weekly sales (y) are presented in the following table • Use fitted regression line to estimate the mean value of y for a given value of x=50. 11/14/2022 NITT / CA 44 S. no. x y xy x2 1 41 1250 51250 1681 2 54 1380 74520 2916 3 63 1425 89775 3969 4 54 1425 76950 2916 5 48 1450 69600 2304 6 46 1300 59800 2116 7 62 1400 86800 3844 8 61 1510 92110 3721 9 64 1575 100800 4096 10 71 1650 117150 5041 Total 564 14365 818755 32604
  • 45. • From previous table we have: • The least squares estimates of the regression coefficients are: • The estimated regression function is: • If the advertising expenditure is Rs 50, then the estimated Sales is: 11/14/2022 NITT / CA 45          818755 14365 32604 564 10 2 xy y x x n 8 . 10 ) 564 ( ) 32604 ( 10 ) 14365 )( 564 ( ) 818755 ( 10 ) ( 2 2 2 1             x x n y x xy n b 828 ) 4 . 56 ( 8 . 10 5 . 1436 0    b e Expenditur 8 . 10 828 Sales 10.8x 828 ŷ     1368 ) 50 ( 8 . 10 828    Sales
  • 46. Residual • The difference between the observed value yi and the corresponding fitted value ., • Residuals are highly useful for studying whether a given regression model is appropriate for the data at hand. 11/14/2022 NITT / CA 46 i ŷ i i i y y e ˆ   y x y-hat Residual (e) 1250 41 1270.8 -20.8 1380 54 1411.2 -31.2 1425 63 1508.4 -83.4 1425 54 1411.2 13.8 1450 48 1346.4 103.6 1300 46 1324.8 -24.8 1400 62 1497.6 -97.6 1510 61 1486.8 23.2 1575 64 1519.2 55.8 1650 71 1594.8 55.2
  • 47. Regression Standard Error • For simple linear regression the estimate of 2 is the average squared residual. • To estimate  , use • This estimates the standard deviation  of the error term  in the statistical model for simple linear regression. 11/14/2022 NITT / CA 47        2 2 2 . ) ˆ ( 2 1 2 1 i i i x y y y n e n s 2 . . x y x y s s 
  • 48. Regression Standard Error 11/14/2022 NITT / CA 48 y x y-hat Residual (e) square(e) 1250 41 1270.8 -20.8 432.64 1380 54 1411.2 -31.2 973.44 1425 63 1508.4 -83.4 6955.56 1425 54 1411.2 13.8 190.44 1450 48 1346.4 103.6 10732.96 1300 46 1324.8 -24.8 615.04 1400 62 1497.6 -97.6 9525.76 1510 61 1486.8 23.2 538.24 1575 64 1519.2 55.8 3113.64 1650 71 1594.8 55.2 3047.04 y-hat = 828+10.8X total 36124.76 Sy .x 67.19818
  • 49. The maths bit • The min sum of squares is at the bottom of the curve where the gradient = 0 • So we can find a and b that give min sum of squares by taking partial derivatives of Σ(y - ax - b)2 with respect to a and b separately • Then we solve these for 0 to give us the values of a and b that give the min sum of squares 11/14/2022 49 NITT / CA
  • 50. The solution • Doing this gives the following equations for a and b: a = r sy sx r = correlation coefficient of x and y sy = standard deviation of y sx = standard deviation of x  From you can see that:  A low correlation coefficient gives a flatter slope (small value of a)  Large spread of y, i.e. high standard deviation, results in a steeper slope (high value of a)  Large spread of x, i.e. high standard deviation, results in a flatter slope (high value of a) 11/14/2022 50 NITT / CA
  • 51. The solution cont. • Our model equation is ŷ = ax + b • This line must pass through the mean so: y = ax + b b = y – ax  We can put our equation for a into this giving: b = y – ax b = y - r sy sx r = correlation coefficient of x and y sy = standard deviation of y sx = standard deviation of x x  The smaller the correlation, the closer the intercept is to the mean of y 11/14/2022 51 NITT / CA
  • 52. Back to the model  If the correlation is zero, we will simply predict the mean of y for every value of x, and our regression line is just a flat straight line crossing the x-axis at y  But this isn’t very useful.  We can calculate the regression line for any data, but the important question is how well does this line fit the data, or how good is it at predicting y from x ŷ = ax + b = r sy sx r sy sx x + y - x r sy sx ŷ = (x – x) + y Rearranges to: a b a a 11/14/2022 52 NITT / CA
  • 53. How good is our model? • Total variance of y: sy 2 = ∑(y – y)2 n - 1 SSy dfy =  Variance of predicted y values (ŷ):  Error variance: sŷ 2 = ∑(ŷ – y)2 n - 1 SSpred dfŷ = This is the variance explained by our regression model serror 2 = ∑(y – ŷ)2 n - 2 SSer dfer = This is the variance of the error between our predicted y values and the actual y values, and thus is the variance in y that is NOT explained by the regression model 11/14/2022 53 NITT / CA
  • 54. • Total variance = predicted variance + error variance sy 2 = sŷ 2 + ser 2 • Conveniently, via some complicated rearranging sŷ 2 = r2 sy 2 r2 = sŷ 2 / sy 2 • so r2 is the proportion of the variance in y that is explained by our regression model How good is our model cont. 11/14/2022 54 NITT / CA
  • 55. How good is our model cont. • Insert r2 sy 2 into sy 2 = sŷ 2 + ser 2 and rearrange to get: ser 2 = sy 2 – r2sy 2 = sy 2 (1 – r2) • From this we can see that the greater the correlation the smaller the error variance, so the better our prediction 11/14/2022 55 NITT / CA
  • 56. Is the model significant? • i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean? • F-statistic: F(dfŷ,dfer) = sŷ 2 ser 2 =......= r2 (n - 2)2 1 – r2 complicated rearranging  And it follows that: t(n-2) = r (n - 2) √1 – r2 (because F = t2) So all we need to know are r and n 11/14/2022 56 NITT / CA
  • 57. General Linear Model • Linear regression is actually a form of the General Linear Model where the parameters are a, the slope of the line, and b, the intercept. y = ax + b +ε • A General Linear Model is just any model that describes the data in terms of a straight line 11/14/2022 57 NITT / CA
  • 58. Multiple regression • Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc, on a single dependent variable, y • The different x variables are combined in a linear way and each has its own regression coefficient: y = a1x1+ a2x2 +…..+ anxn + b + ε • The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y. • i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for 11/14/2022 58 NITT / CA