This document contains slides from a presentation on simple linear regression and correlation. It introduces simple linear regression modeling, including estimating the regression line using the method of least squares. It discusses the assumptions of the simple linear regression model and defines key terms like the regression coefficients (intercept and slope), error variance, standard errors of the estimates, and how to perform hypothesis tests and construct confidence intervals for the regression parameters. Examples are provided to demonstrate calculating quantities like sums of squares, estimating the regression line, and evaluating the fit of the regression model.
1. Simple Linear Regression and
Correlation
Slide 1
Shakeel Nouman
M.Phil Statistics
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
2. Slide 2
10 Simple Linear Regression and Correlation
•
•
•
•
•
•
•
•
•
•
•
Using Statistics
The Simple Linear Regression Model
Estimation: The Method of Least Squares
Error Variance and the Standard Errors of Regression
Estimators
Correlation
Hypothesis Tests about the Regression Relationship
How Good is the Regression?
Analysis of Variance Table and an F Test of the
Regression Model
Residual Analysis and Checking for Model Inadequacies
Use of the Regression Model for Prediction
Summary and Review of Terms
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
3. Slide 3
10-1 Using Statistics
This scatterplot locates pairs of observations of
advertising expenditures on the x-axis and sales
on the y-axis. We notice that:
Scatterplot of Advertising Expenditures (X) and Sales (Y)
140
120
Larger (smaller) values of sales tend to be
associated with larger (smaller) values of
advertising.
S ale s
100
80
60
40
20
0
0
10
20
30
40
50
Ad ve rtis ing
The scatter of points tends to be distributed around a positively sloped straight line.
The pairs of values of advertising expenditures and sales are not located exactly on a
straight line.
The scatter plot reveals a more or less strong tendency rather than a precise linear
relationship.
The line represents the nature of the relationship on average.
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
4. Examples of Other Scatterplots
Slide 4
0
Y
Y
Y
0
0
0
0
X
X
X
Y
Y
Y
X
X
X
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
5. Model Building
The inexact nature of the
relationship between
advertising and sales
suggests that a statistical
model might be useful in
analyzing the relationship.
A statistical model separates
the systematic component
of a relationship from the
random component.
Data
Statistical
model
Systematic
component
+
Random
errors
Slide 5
In ANOVA, the systematic
component is the variation
of means between samples
or treatments (SSTR) and
the random component is
the unexplained variation
(SSE).
In regression, the
systematic component is
the overall linear
relationship, and the
random component is the
variation around the line.
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
6. 10-2 The Simple Linear
Regression Model
Slide 6
The population simple linear regression model:
Y= 0 + 1 X
+
Nonrandom or
Random
Systematic
Component
Component
where
Y is the dependent variable, the variable we wish to explain or predict
X is the independent variable, also called the predictor variable
is the error term, the only random component in the model, and thus, the
only source of randomness in Y.
0 is the intercept of the systematic component of the regression relationship.
1 is the slope of the systematic component.
The conditional mean of Y:
E [Y X ] 0 1 X
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
7. Picturing the Simple Linear
Regression Model
Y
Regression Plot
E[Y]= + X
0
1
Yi
}
{
Error:
i
}
Slide 7
The simple linear regression
model gives an exact linear
relationship between the
expected or average value of Y,
the dependent variable, and X,
the independent or predictor
variable:
E[Yi]=0 + 1 Xi
= Slope
1
Actual observed values of Y
differ from the expected value by
an unexplained or random error:
1
= Intercept
0
X
Yi = E[Yi] + i
= 0 + 1 Xi + i
Xi
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
8. Assumptions of the Simple
Linear Regression Model
• The relationship between X
•
•
and Y is a straight-line
relationship.
The values of the
independent variable X are
assumed fixed (not
random); the only
randomness in the values of
Y comes from the error term
i.
The errors i are normally
distributed with mean 0 and
variance s2. The errors are
uncorrelated (not related) in
successive observations.
That is: ~ N(0,s2)
Y
Slide 8
Assumptions of the Simple
Linear Regression Model
E[Y]=0 + 1 X
Identical normal
distributions of errors,
all centered on the
regression line.
X
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
9. 10-3 Estimation: The Method of
Least Squares
Slide 9
Estimation of a simple linear regression relationship involves finding estimated
or predicted values of the intercept and slope of the linear regression line.
The estimated regression equation:
Y = b0 + b1X + e
where b0 estimates the intercept of the population regression line, 0 ;
b1 estimates the slope of the population regression line, 1;
and e stands for the observed errors - the residuals from fitting the estimated
regression line b0 + b1X to a set of n points.
The estimated regression line:
Y b0 + b1 X
where Y (Y - hat) is the value of Y lying on the fitted regression line for a given
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
value of X.
10. Fitting a Regression Line
Y
Slide 10
Y
Data
X
Thr rrors from
th last squars
rgrssion lin
X
Y
Three errors
from a fitted line
X
Errors from the least
squares regression
line are minimized
X
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
11. Errors in Regression
Slide 11
Y
the observed data point
Yi
{
Error ei Yi Yi
Yi
Xi
Y b0 b1 X
Yi the predicted value of Y for X
the fitted regression line
i
X
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
12. Least Squares Regression
Slide 12
The sum of squared errors in regression is:
n
n
e
(y i y i ) 2
i=1
SSE =
i=1
2
i
The least squares regression line is that which minimizes the SSE
with respect to the estimates b 0 and b 1 .
SSE
The normal equations:
n
y
0
n
i
nb0 b1 x i
i=1
i=1
Least squares b0
n
n
n
i=1
i=1
i=1
x i y i b0 x i b1 x 2
i
Least squares b1
At this point
SSE is
minimized
with respect
to b0 and b1
1
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
13. Sums of Squares, Cross
Products, and Least Squares
Estimators
Slide 13
Sums of Squares and Cross Products:
SSx (x x ) x
2
2
x
2
n 2
y
2
2
SS y ( y y ) y
n
SSxy (x x )( y y )
x ( y )
xy
n
Least squares regression estimators:
SS XY
b1
SS X
b0 y b1 x
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
15. Template (partial output) that
can be used to carry out a
Simple Regression
Slide 15
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
16. Template (continued) that can
be used to carry out a Simple
Regression
Slide 16
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
17. Template (continued) that can
be used to carry out a Simple
Regression
Slide 17
Residual Analysis. The plot shows the absence of a relationship
between the residuals and the X-values (miles).
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
18. Template (continued) that can be
used to carry out a Simple
Regression
Slide 18
Note: The normal probability plot is approximately linear. This
would indicate that the normality assumption for the errors has not
been violated.
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
19. Total Variance and Error
Variance
Y
Slide 19
Y
X
What you see when looking
at the total variation of Y.
X
What you see when looking
along the regression line at
the error variance of Y.
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
20. 10-4 Error Variance and the
Standard Errors of Regression
Estimators
Slide 20
Y
Degrees of Freedom in Regression:
df = (n - 2) (n total observations less one degree of freedom
for each parameter estimated (b 0 and b1 ) )
2
( SS XY )
2
SSE = ( Y - Y ) SSY
SS X
= SSY b1SS XY
2
2
An unbiased estimator of s , denoted by S :
SSE
MSE =
(n - 2)
Square and sum all
regression errors to find
SSE.
X
Example 10 - 1:
SSE = SS Y b1 SS XY
66855898 (1.255333776)( 51402852 .4 )
2328161.2
MSE
SSE
n2
101224 .4
s
MSE
2328161.2
23
101224 .4 318.158
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
21. Standard Errors of Estimates
in Regression
The standard error of b0 (intercept):
s(b0 )
where s =
s
x2
nSS X
MSE
The standard error of b1 (slope):
s(b1 )
s
SS X
Slide 21
Example 10 - 1:
2
s x
s(b0 )
nSS X
318.158 293426944
( 25)( 4097557.84 )
170.338
s
s(b1 )
SS X
318.158
40947557.84
0.04972
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
22. Confidence Intervals for the
Regression Parameters
A (1 - a ) 100% confidence interval for b :
0
b t a
s (b )
0 ,(n 2 ) 0
2
A (1 - a ) 100% confidence interval for b :
1
b t a
s (b )
1 ,(n 2 ) 1
2
Least-squares point estimate:
b1=1.25533
0
Example 10 - 1
95% Confidence Intervals:
b t
s (b )
0 0.025,( 25 2 ) 0
= 274.85 ( 2.069) (170.338)
274.85 352.43
[ 77.58, 627.28]
b1 t
0.025,( 25 2 )
s (b1 )
= 1.25533 ( 2.069) ( 0.04972 )
1.25533 010287
.
[115246,1.35820]
.
Height = Slope
Length = 1
Slide 22
(not a possible value of the
regression slope at 95%)
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
23. Slide 23
Template (partial output) that can
be used to obtain Confidence
Intervals for 0 and 1
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
24. 10-5 Correlation
Slide 24
The correlation between two random variables, X and Y, is a
measure of the degree of linear association between the two
variables.
The population correlation, denoted by r, can take on any value
from -1 to 1.
r 1 indicates a perfect negative linear relationship
-1 < r < 0
indicates a negative linear relationship
r0
indicates no linear relationship
0<r<1
indicates a positive linear relationship
r1
indicates a perfect positive linear relationship
The absolute value of r indicates the strength or exactness of the
relationship.
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
25. Slide 25
Illustrations of Correlation
Y
r 1
Y
r 8
X
Y
X
X
Y
r0
Y
r0
X
r1
X
Y
r 8
X
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
26. Covariance and Correlation
Slide 26
The covariance of two random variables X and Y:
Cov ( X , Y ) E [( X m )(Y m )]
X
Y
where m and m Y are the population means of X and Y respectively.
X
The population correlation coefficient:
Cov ( X , Y )
r=
s s
X Y
The sample correlation coefficient * :
SS
r=
Not:
XY
SS SS
X Y
Exampl 101:
SS
XY
r
SS SS
X Y
51402852
4
40947557
84 66855898
51402852
4
9824
52321943
29
If r < 0 1 < 0 If r 0 1 0 If r > 0 1 >0
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
27. Hypothesis Tests for the
Correlation Coefficient
H0: r = 0 (No linear relationship)
H1: r 0 (Some linear relationship)
Test Statistic: r
t( n 2 )
1 r2
n2
Slide 27
Example 10 -1:
r
t( n 2 )
1 r2
n2
0.9824
=
1 - 0.9651
25 - 2
0.9824
=
25.25
0.0389
t0. 005 2.807 < 25.25
H 0 rejected at 1% level
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
28. 10-6 Hypothesis Tests about the
Regression Relationship
Constant Y
Unsystematic Variation
Y
Y
X
Slide 28
Nonlinear Relationship
Y
X
X
A hypothesis test for the existence of a linear relationship between X and Y:
H0: b1 = 0
H1: b 1 ¹ 0
Test statistic for the existence of a linear relationship between X and Y:
b
1
=
t
(n - 2)
s(b )
1
where b is the least - squares estimate of the regression slope and s ( b ) is the standard error of b .
1
1
1
When the null hypothesis is true, the statistic has a t distribution with n - 2 degrees of freedom.
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
29. Hypothesis Tests for the
Regression Slope
Example 10 - 4 :
Example 10 - 1:
H0: 1 0
H1: 1 0
t
b
1
s(b )
1
1.25533
(n - 2)
=
Slide 29
25.25
H : 1
0 1
H : 1
1 1
b 1
t
1
( n - 2) s (b )
1
1.24 - 1
=
1.14
0.21
0.04972
2.807 < 25.25
t
( 0 . 005 , 23 )
H 0 is rejected at the 1% level and we may
conclude that there is a relationship between
charges and miles traveled.
1.671 > 1.14
(0.05,58)
H is not rejected at the 10% level.
0
We may not conclude that the beta
coefficient is different from1.
t
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
30. 10-7 How Good is the
Regression?
Slide 30
The coefficient of determination, r2, is a descriptive measure of
the strength of the regression relationship, a measure of how well the
regression line fits the data.
Y
ˆ
( y y ) ( y y)
Total = Unexplained
Deviation
Deviation
(Error)
.
Y
Unexplained Deviation
}
{
Y
Explained Deviation
ˆ
( y y)
Explained
Deviation
(Regression)
ˆ
ˆ
( y y ) 2 ( y y )2 ( y y )
SST
= SSE
+ SSR
Total Deviation
{
2
r 2 SSR 1 SSE
SST
SST
Y
X
Percentage of
total variation
explained by the
regression.
X
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
31. The Coefficient of Determination
Y
Y
SST
r2=0
Y
X
SSE
X
r2=0.50
SST
SSE SSR
r2=0.90
S
S
E
SST
SSR
6000
Dollars
SSR 64527736.8
0.96518
SST
66855898
X
7000
Example 10 -1:
r2
Slide 31
5000
4000
3000
2000
1000 1500 2000 2500 3000 3500 4000 4500 5000 5500
Miles
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
32. 10-8 Analysis of Variance and an F
Test of the Regression Model
Source of
Variation
Sum of
Squares
Regression SSR
Slide 32
Degrees of
Freedom Mean Square F Ratio
(1)
MSR
Error
SSE
(n-2)
MSE
Total
SST
(n-1)
MSR
MSE
MST
Example 10-1
Source of
Variation
Sum of
Squares
Regression 64527736.8
Degrees of
Freedom
F Ratio p Value
1
Mean Square
64527736.8
637.47
101224.4
Error
2328161.2
23
Total
66855898.0
0.000
24
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
33. Slide 33
Template (partial output) that
displays Analysis of Variance and
an F Test of the Regression Model
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
34. 10-9 Residual Analysis and
Checking
for Model Inadequacies
Slide 34
Residuals
Residuals
0
0
x or y
x or y
Homoscedasticity: Residuals appear completely
random. No indication of model inadequacy.
Residuals
Heteroscedasticity: Variance of residuals
changes when x changes.
Residuals
0
0
Time
Residuals exhibit a linear trend with time.
x or y
Curved pattern in residuals resulting from
underlying nonlinear relationship.
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
35. Normal Probability Plot of the
Residuals
Slide 35
Flatter than Normal
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
36. Normal Probability Plot of the
Residuals
Slide 36
More Peaked than Normal
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
37. Normal Probability Plot of the
Residuals
Slide 37
More Positively Skewed than Normal
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
38. Normal Probability Plot of the
Residuals
Slide 38
More Negatively Skewed than Normal
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
39. 10-10 Use of the Regression
Model for Prediction
•
•
Slide 39
Point Prediction
A single-valued estimate of Y for a given value
of X obtained by inserting the value of X in the
estimated regression equation.
Prediction Interval
For a value of Y given a value of X
» Variation in regression line estimate
» Variation of points around regression line
For an average value of Y given a value of X
» Variation in regression line estimate
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
40. Errors in Predicting E[Y|X]
Y
Y
Upper limit on slope
Slide 40
Upper limit on intercept
Regression line
Lower limit on slope
Y
X
X
1) Uncertainty about the
slope
of the regression line
Regression line
Y
Lower limit on intercept
X
X
2) Uncertainty about the
intercept
of the regression line
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
41. Prediction Interval for E[Y|X]
Y
•
Prediction band for E[Y|X]
Regression
line
•
Y
X
X
Prediction Interval for E[Y|X]
•
Slide 41
The prediction band for
E[Y|X] is narrowest at the
mean value of X.
The prediction band widens
as the distance from the
mean of X increases.
Predictions become very
unreliable when we
extrapolate beyond the range
of the sample itself.
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
42. Additional Error in Predicting
Individual Value of Y
Y
Regression line
Y
Slide 42
Prediction band for E[Y|X]
Regression
line
Y
Prediction band for Y
X
3) Variation around the regression
line
X
X
Prediction Interval for E[Y|X]
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
43. Prediction Interval for a Value of
Y
Slide 43
A (1 - a ) 100% prediction interval for Y :
1 (x x)
y t s 1
ˆ
n
SS
2
a
2
X
Example 10 - 1 (X = 4,000) :
1 (4,000 3,177.92)
{274.85 (1.2553)(4,000)} 2.069 318.16 1
25
40,947,557.84
2
5296 .05 676.62 [4619 .43, 5972 .67]
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
44. Prediction Interval for the
Average Value of Y
Slide 44
A (1 - a ) 100% prediction interval for the E[ Y X] :
1 (x x)
yt s
ˆ
n
SS
2
a
2
X
Example 10 - 1 (X = 4,000) :
1 (4,000 3,177.92)
{274.85 (1.2553)(4,000)} 2.069 318.16
25
40,947,557.84
2
5,296.05 156.48 [5139 .57, 5452 .53]
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
45. Template Output with Prediction
Intervals
Slide 45
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
46. 10-11 The Solver Method for
Regression
Slide 46
The solver macro available in EXCEL can also be used to
conduct a simple linear regression. See the text for
instructions.
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
47. Slide 47
Name
Religion
Domicile
Contact #
E.Mail
M.Phil (Statistics)
Shakeel Nouman
Christian
Punjab (Lahore)
0332-4462527. 0321-9898767
sn_gcu@yahoo.com
sn_gcu@hotmail.com
GC University, .
(Degree awarded by GC University)
M.Sc (Statistics)
Statitical Officer
(BS-17)
(Economics & Marketing
Division)
GC University, .
(Degree awarded by GC University)
Livestock Production Research Institute
Bahadurnagar (Okara), Livestock & Dairy Development
Department, Govt. of Punjab
Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer