SlideShare une entreprise Scribd logo
1  sur  95
Regression Analysis
MULTIPLE REGRESSION
[ CROSS-SECTIONAL DATA ]
ASSOC PROF ERGIN AKALPLER
Learning Objectives
 Explain the linear multiple regression model [for cross-
sectional data]
 Interpret linear multiple regression computer output
 Explain multicollinearity
 Describe the types of multiple regression models
Regression Modeling Steps
 Define problem or question
 Specify model
 Collect data
 Do descriptive data analysis
 Estimate unknown parameters
 Evaluate model
Use model for prediction
Y-hat = 0 + 1x1 + 2x2 + ... + PxP + 
Simple vs. Multiple
  represents the
unit change in Y
per unit change in
X .
 Does not take into
account any other
variable besides
single independent
variable.
 i represents the unit
change in Y per unit
change in Xi. Multiple
variable
 Takes into account
the effect of other
i s.
 “Net regression
coefficient.”
Assumptions
Linearity - the Y variable is linearly related
to the value of the X variable.
Independence of Error - the error
(residual) is independent for each value of X.
Homoscedasticity - the variation around
the line of regression be constant for all
values of X.
Normality - the values of Y be normally
distributed at each value of X.
Regression to be performed with hypothesis
Ho: γ= 0 Yt becomes non-stationary
H1: γ≠ 0 Yt is significant
(all variables must become significant at level)
And for residuals,
Ho: γ= 0 Yt residuals are not serially correlated and not
heteroscedastic and normally distributed
H1: γ≠ 0 Yt residuals are serially correlated,
heteroscedastic and not normally distributed.
Best regression model
null, Ho= residuals are not serially corelated
alt, H1= residuals are serially corelated
 R2 value must be high
(it should be 60% and more for good model)
 No serial correlation
(LM test probability value must be higher than 0.05 )
 No hetroscedasticity
(in the residual p value must be highe than 0.05)
Residual are normally distributed
(histogram p value must be higher than 0.05)
Sample Model with hypo for ols

Goal
Develop a statistical model
that can predict the values of
a dependent (response)
variable based upon the
values of the independent
(explanatory) variables.
Simple Regression
A statistical model that utilizes
one quantitative independent
variable “X” to predict the
quantitative dependent
variable “Y.”
Ols interpretation
Variables Coefficient St error T stats Prob.
C -3188845 1822720 -1.749487 0.0866
Income 0.819235 0.003190 256,7871 0.0000
R2 0.999 Mean dependency 3522.160
Adjusted R 0.9999 St dependent var. 3077,678
SE of Reg 82.86681 Akaike info criterion 11,73539
Sum square resid 337614,8 Schwarz criterion 11.87820
Log likelihood -292,3779 Hanna quin criterion 1176500
F stats 65939.59 Durbin Watson stats 0.568044
Prob 0.0000
For ols results
 Coefficient signs explains the direction of relation between explanatory and
dependent variables
 The standard error of a coefficient indicates the accuracy of the estimated
ordinary least squares (OLS) coefficient with respect to its population
parameter. Each standard error is the square root of the variance of the
corresponding coefficient.
 t-test is a statistical hypothesis testing technique that is used to test the
linearity of the relationship between the response variable and different
predictor variables. In other words, it is used to determine whether or not
there is a linear correlation between the response and predictor variables.
The t-test helps to determine if this linear relationship is statistically
significant
 it is estimated by dividing the coefficient to the st error
 t stats= coefficient/ st error
Ols results
 Probability value must be between zero and 0.05 to be
significant model.
 The p-value for each term tests the null hypothesis that
the coefficient is equal to zero (no effect).
 A low p-value (< 0.05) indicates that you can reject the
null hypothesis. In other words, a predictor that has a
low p-value is likely to be a meaningful addition to your
model because changes in the predictor's value are
related to changes in the response variable.
 Ho residual are not normally distributed
Ols results
 R2 value explain how many percentage of Y
dependent variable will be explained by
explanatory variables X. The effects of
independent variables on dependent variable
 Adjusted R2 can increase or decrease the
independent variable. Too many explanatory
variable may cause negative sign.

Ols results
 F statistic this statistic tells how jointly
significant explanatory variables affect
dependent variable
The higher the F value the better the model
 Probability statistics lower the value the
better the model. It tells the statistically
significance of the model.
Ols results
 Mean dependent variable is the average value
of the dependent variable
 AIC SIC and HQC are used to choose the best
model the lower the value the better the
model AIC here is the lowest value gives us
the best model to adop for model
 Durbin Watson stats tell the serial correlation,
if the DW is less than two it is the evidence of
positive serial correlation and the model is
suffering from serial correlation.
Multiple Regression
A statistical model that utilizes two
or more quantitative and qualitative
explanatory variables (x1,..., xp) to
predict a quantitative dependent
variable Y.
Caution: have at least two or more quantitative
explanatory variables (rule of thumb)
Multiple regression
 Multiple regression is a statistical technique that
can be used to analyze the relationship between
a single dependent variable and several
independent variables. The objective of multiple
regression analysis is to use the independent
variables whose values are known to predict the
value of the single dependent value.
Multiple regression assumptions
 Assumption #1: Your dependent variable should
be measured on a continuous scale (i.e., it is either
an interval or ratio variable). Examples of variables
that meet this criterion include revision time
(measured in hours), intelligence (measured using IQ
score), exam performance (measured from 0 to 100),
weight (measured in kg), and so forth
Multiple regression assumptions
 Assumption #2: You have two or more
independent variables, which can be
either continuous (i.e., an interval or ratio variable)
or categorical (i.e., an ordinal or nominal variable).
For examples of continuous and ordinal variables,
see the bullet above. Examples of nominal
variables include gender (e.g., 2 groups: male and
female), ethnicity
Multiple regression
assumptions
 Assumption #3: You should
have independence of
observations (i.e., independence of
residuals), which you can easily check using
the Durbin-Watson statistic,
Multiple regression
assumptions
 Assumption #4: There needs to be a linear
relationship between (a) the dependent
variable and each of your independent
variables, and (b) the dependent variable and
the independent variables collectively.
Whilst there are a number of ways to check
for these linear relationships,
Multiple regression
assumptions
 Assumption #5: Your data needs to
show homoscedasticity, which is where the
variances along the line of best fit remain similar
as you move along the line. We explain more
about what this means and how to assess the
homoscedasticity of your data in our enhanced
multiple regression guide.
Multiple regression assumptions
 Assumption #6: Your data must not
show multicollinearity (ne realtion
between variables), which occurs when you
have two or more independent variables that
are highly correlated with each other. This
leads to problems with understanding which
independent variable contributes to the
variance explained in the dependent variable,
as well as technical issues in calculating a
multiple regression model.
Multiple regression
assumptions
 Assumption #7: There should be no significant
outliers, high leverage points or highly
influential points.
Multiple regression
assumptions
 Finally, you need to check that the residuals
(errors) are approximately normally
distributed (we explain these terms in our
enhanced multiple regression guide).
Estimate histogram ;
 (p value must be higher than 0.05)
Multiple Regression Model
more than two variables
X2
X1
Y
e
Hypotheses
 H0: 1 = 2 = 3 = ... = P = 0
 H1: At least one regression
coefficient is not equal to zero
Hypotheses (alternate format)
H0: i = 0
H1: i  0
Types of Models
 Positive linear relationship
 Negative linear relationship
 No relationship between X and Y
 Positive curvilinear relationship
 U-shaped curvilinear
 Negative curvilinear relationship
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Linear Model
Relationship between one dependent & two or more independent
variables is a linear function




 




 P
P X
X
X
Y 
2
2
1
1
0
Dependent
(response)
variable
Independent
(explanatory)
variables
Population
slopes
Population
Y-intercept
Random
error
Method of Least Squares
 The straight line that best fits the data.
 Determine the straight line for which the differences
between the actual values (Y) and the values that
would be predicted from the fitted line of regression
(Y-hat) are as small as possible.
Measures of Variation
Explained variation (sum of
squares due to regression)
Unexplained variation (error
sum of squares)
Total sum of squares
Coefficient of Multiple Determination
When null hypothesis is rejected, a
relationship between Y and the X
variables exists.
Strength measured by R2
[ several types ]
Coefficient of Multiple
Determination
R2
y.123- - -P
The proportion of Y that is
explained by the set of
explanatory variables selected
Standard Error of the Estimate
sy.x
the measure of variability around
the line of regression
Interval Bands [from simple regression]
X
Y
X
Yi
= b0
+ b1
X
^
Xgiven
_
Multiple Regression Equation
Y-hat = 0 + 1x1 + 2x2 + ... + PxP + 
where:
0 = y-intercept {a constant value}
1 = slope of Y with variable x1 holding the variables x2, x3, ...,
xP effects constant
P = slope of Y with variable xP holding all
other variables’ effects constant
Mini-Case
Predict the consumption of home
heating oil during January for
homes located around Screne
Lakes. Two explanatory variables
are selected - - average daily
atmospheric temperature (oF) and
the amount of attic insulation (“).
O il (G a l) Te m p Insula tion
275.30 40 3
363.80 27 3
164.30 40 10
40.80 73 6
94.30 64 6
230.90 34 6
366.70 9 6
300.60 8 10
237.80 23 10
121.40 63 3
31.40 65 10
203.50 41 6
441.10 21 3
323.00 38 3
52.50 58 10
Mini-Case
(0F)
Develop a model for estimating
heating oil used for a single
family home in the month of
January based on average
temperature and amount of
insulation in inches.
Mini-Case
 Oil is dependent and temp is independent
 What preliminary conclusions can home owners draw from the
data?
 What could a home owner expect heating oil consumption (in
gallons) to be if the outside temperature is 15 oF when the attic
insulation is 10 inches thick?
 Model: Oil= tepm+attic insulation +error term
Multiple Regression Equation
[mini-case]
Dependent variable: Gallons Consumed
-------------------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
--------------------------------------------------------------------------------------
CONSTANT 562.151 21.0931 26.6509 0.0000
Insulation -20.0123 2.34251 -8.54313 0.0000
Temperature -5.43658 0.336216 -16.1699 0.0000
--------------------------------------------------------------------------------------
R-squared = 96.561 percent
R-squared (adjusted for d.f.) = 95.9879 percent
Standard Error of Est. = 26.0138
+
Multiple Regression Equation
[mini-case]
Y-hat = 562.15 - 5.44x1 - 20.01x2
where: x1 = temperature [degrees F]
x2 = attic insulation [inches]
Multiple Regression Equation
[mini-case]
Y-hat = 562.15 - 5.44x1 - 20.01x2
thus:
 For a home with zero inches of attic
insulation and an outside temperature of
0 oF, 562.15 gallons of heating oil would
be consumed.
[ caution .. data boundaries .. extrapolation ]
+
Extrapolation is the process of creating new data
point out of a discrete set of known data points
Y
Interpolation
X
Extrapolation Extrapolation
Relevant Range
Multiple Regression Equation
[mini-case]
Y-hat = 562.15 - 5.44x1 - 20.01x2
 For a home with zero attic insulation and an outside temperature of zero,
562.15 gallons of heating oil would be consumed.
 [ caution .. data boundaries .. extrapolation ]
 For each incremental increase in degree F of
temperature, for a given amount of attic
insulation, heating oil consumption drops 5.44
gallons.
+
Multiple Regression Equation
[mini-case]
Y-hat = 562.15 - 5.44x1 - 20.01x2
 For a home with zero attic insulation and an outside temperature of zero,
562 gallons of heating oil would be consumed. [ caution … ]
 For each incremental increase in degree F of temperature, for a given
amount of attic insulation, heating oil consumption drops 5.44 gallons.
For each incremental increase in inches of
attic insulation, at a given temperature,
heating oil consumption drops 20.01
gallons.
Multiple Regression Prediction
[mini-case]
Y-hat = 562.15 - 5.44x1 - 20.01x2
with x1 = 15oF and x2 = 10 inches
Y-hat = 562.15 - 5.44(15) - 20.01(10)
= 280.45 gallons consumed
Coefficient of Multiple Determination
[mini-case]
R2
y.12 = .9656
96.56 percent of the variation in
heating oil can be explained by
the variation in temperature
insulation.
Is a very high effects of temp and
attic on oil consumiton
Coefficient of Multiple Determination
 Proportion of variation in Y ‘explained’ by all X variables taken
together
 R2
Y.12 = Explained variation = SSR
Total variation SST
 sum of squares due to regression (SSR), ∑(Ŷ − Ȳ)2.
 SST is the total sum of squares. R-square can take on any value
between 0 and 1, with a value closer to 1 indicating that a
greater proportion of variance is accounted for by the model.
 Never decreases when new X variable is added to model
 Only Y values determine SST
 Disadvantage when comparing models
Coefficient of Multiple Determination
Adjusted
 Proportion of variation in Y ‘explained’ by all X variables
taken together
 Reflects
 Sample size
 Number of independent variables
 Smaller [more conservative] than R2
Y.12
 Used to compare models
Coefficient of Multiple Determination
(adjusted)
R2
(adj) y.123- - -P
The proportion of Y that is explained by the
set of independent [explanatory] variables
selected, adjusted for the number of
independent variables and the sample size.
Coefficient of Multiple Determination
(adjusted) [Mini-Case]
R2
adj = 0.9599
95.99 percent of the variation in
heating oil consumption can be
explained by the model - adjusted
for number of independent
variables and the sample size
Coefficient of Partial Determination
 Proportion of variation in Y ‘explained’ by variable XP
holding all others constant
 Must estimate separate models
 Denoted R2
Y1.2 in two X variables case
 Coefficient of partial determination of X1 with Y
holding X2 constant
 Useful in selecting X variables
Coefficient of Partial
Determination [p. 878]
R2
y1.234 --- P
The coefficient of partial variation of
variable Y with x1 holding constant
the effects of variables x2, x3, x4, ... xP.
Testing Overall Significance
 Shows if there is a linear relationship between all X
variables together & Y
 Uses p-value
 Hypotheses
 H0: 1 = 2 = ... = P = 0
No linear relationship
 H1: At least one coefficient is not 0
At least one X variable affects Y
Testing Model Portions
 Examines the contribution of a set of X
variables to the relationship with Y
 Null hypothesis:
 Variables in set do not improve
significantly the model when all other
variables are included
 Must estimate separate models
 Used in selecting X variables
Diagnostic Checking with
following
H0 retain or reject
If reject - {p-value  0.05}
R2
adj
Correlation matrix
Partial correlation matrix
Multicollinearity
 It is the occurrence of high
intercorrelations among two or
more independent variables in
a multiple regression model.
Multicolinearity is a problem
 Multicollinearity is a problem because it
produces regression model results that are
less reliable.
 This is due to wider confidence intervals
(larger standard errors) that
 can lower the statistical significance of
regression coefficients.
Multicollinearity
 High correlation between X variables
 Coefficients measure combined effect
 Leads to unstable coefficients depending on X
variables in model
 Always exists; matter of degree
 Example: Using both total number of rooms and
number of bedrooms as explanatory variables in same
model (independent variables)
Detecting Multicollinearity
 Examine correlation matrix
 Correlations between pairs of X variables are more
than with Y variable
 Few solution (remedies)
 Obtain new sample data
 Eliminate one correlated X variable
Evaluating Multiple Regression Model Steps
 Examine variation measures
 Do residual analysis
 Test parameter significance
Overall model
Portions of model
Individual coefficients
 Test for multicollinearity
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Dummy-Variable Regression Model
 Involves categorical X variable with
two levels
e.g., female-male,
employed-not employed, etc.
Dummy-Variable Regression Model
 Involves categorical X variable with
two levels
 e.g., female-male,
 employed-not employed, etc.
 Variable levels coded 0 & 1
Dummy-Variable Regression Model
 Involves categorical X variable with
two levels
e.g., female-male, employed-not employed,
etc.
 Variable levels coded 0 & 1
 Assumes only intercept is different
Slopes are constant across categories
Dummy-Variable Model Relationships
Y
X1
0
0
Same slopes b1
b0
b0 + b2
Females
Males
Dummy Variables
 Permits use of
qualitative data
(e.g.: seasonal, class
standing, location,
gender).
 0, 1 coding
(nominative data)
 As part of Diagnostic
Checking;
incorporate outliers
(i.e.: large residuals)
and influence
measures.
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Interaction Regression Model
 Hypothesizes interaction between pairs of X
variables
Response to one X variable varies at different
levels of another X variable
 Contains two-way cross product terms
Y = 0 + 1x1 + 2x2 + 3x1x2 + 
 Can be combined with other models
e.g. dummy variable models
Effect of Interaction
 Given:
 Without interaction term, effect of X1 on Y is measured by 1
 With interaction term, effect of X1 on Y is measured by 1 +
3X2
 Effect increases as X2i increases
Y X X X X
i i i i i i
    
    
0 1 1 2 2 3 1 2
Interaction Example
X1
4
8
12
0
0 1
0.5 1.5
Y
Y = 1 + 2X1 + 3X2 + 4X1X2
Interaction Example
X1
4
8
12
0
0 1
0.5 1.5
Y
Y = 1 + 2X1 + 3X2 + 4X1X2
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
Interaction Example
Y
X1
4
8
12
0
0 1
0.5 1.5
Y = 1 + 2X1 + 3X2 + 4X1X2
Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
Interaction Example
Effect (slope) of X1 on Y does depend on X2 value
X1
4
8
12
0
0 1
0.5 1.5
Y
Y = 1 + 2X1 + 3X2 + 4X1X2
Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
The Difference between Linear and Nonlinear
Regression Models
 The Difference between Linear and Nonlinear
Regression Models
 The difference between linear and nonlinear
regression models isn’t as straightforward as it
sounds.
 You’d think that linear equations produce straight
lines and nonlinear equations model curvature.
Unfortunately, that’s not correct.
Linear Regression Equations
 A linear regression model follows a very particular
form. In statistics, a regression model is linear
when all terms in the model are one of the
following:
 The constant
 A parameter multiplied by an independent
variable (IV)
 Then, you build the equation by only adding the
terms together. These rules limit the form to just
one type:
Linear regression
 Then, you build the equation by only adding the terms
together. These rules limit the form to just one type:
 Dependent variable = constant + parameter * IV + … +
parameter * IV
The regression example below models the relationship between body mass
index (BMI) and body fat percent.
In a different blog post, I use this model to show how to make predictions
with regression analysis.
It is a linear model that uses a quadratic (squared) term to model the
curved relationship.
Nonlinear Regression Equations
 I showed how linear regression models have
one basic configuration.
 Now, we’ll focus on the “non” in nonlinear! If a
regression equation doesn’t follow the rules for
a linear model, then it must be a nonlinear
model.
 It’s that simple! A nonlinear model is literally
not linear.
Non linear regression
 Consequently, nonlinear regression can fit
an enormous variety of curves.
 However, because there are so many
candidates, you may need to conduct some
research to determine which functional
form provides the best fit for your data.
Non linear regression  Beside, I present a handful of
examples that illustrate the
diversity of nonlinear regression
models. Keep in mind that each
function can fit a variety of
shapes, and there are many
nonlinear functions. Also, notice
how nonlinear regression
equations are not comprised of
only addition and multiplication!
In
 the table, thetas (dependent)
are the parameters, and
 Xs are the independent variables.
non Linear Models
 Non-linear models that can be expressed in
linear form
Can be estimated by least square in
linear form
 Require data transformation
Curvilinear Model Relationships
Y
X1
Y
X1
Y
X1
Y
X1
Logarithmic Transformation
Y =  + 1 lnx1 + 2 lnx2 + 
Y
X1
1 > 0
1 < 0
Square-Root Transformation
Y
X1
Y X X
i i i i
   
   
0 1 1 2 2
1 > 0
1 < 0
Reciprocal Transformation
Y
X1
1 > 0
1 < 0
i
i
i
i
X
X
Y 


 



2
2
1
1
0
1
1
Asymptote
Exponential Transformation
Y
X1 1 > 0
1 < 0
Y e
i
X X
i
i i
  
  

0 1 1 2 2
End of Regression
Analysis /
multiregression
THANK YOU
Ders 2 ols .ppt
Ders 2 ols .ppt

Contenu connexe

Tendances

Multivariate analyses
Multivariate analysesMultivariate analyses
Multivariate analyses
Naveen Deswal
 

Tendances (20)

Analysis of variance
Analysis of varianceAnalysis of variance
Analysis of variance
 
Ancova and Mancova
Ancova and MancovaAncova and Mancova
Ancova and Mancova
 
Multivariate analysis - Multiple regression analysis
Multivariate analysis -  Multiple regression analysisMultivariate analysis -  Multiple regression analysis
Multivariate analysis - Multiple regression analysis
 
Concept of Kurtosis
Concept of KurtosisConcept of Kurtosis
Concept of Kurtosis
 
Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Analysis of variance (ANOVA)
Analysis of variance (ANOVA)
 
Graphical representation of data mohit verma
Graphical representation of data mohit verma Graphical representation of data mohit verma
Graphical representation of data mohit verma
 
Measure of Dispersion
Measure of DispersionMeasure of Dispersion
Measure of Dispersion
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 
Correlationanalysis
CorrelationanalysisCorrelationanalysis
Correlationanalysis
 
Measure of dispersion part II ( Standard Deviation, variance, coefficient of ...
Measure of dispersion part II ( Standard Deviation, variance, coefficient of ...Measure of dispersion part II ( Standard Deviation, variance, coefficient of ...
Measure of dispersion part II ( Standard Deviation, variance, coefficient of ...
 
Skewness
Skewness Skewness
Skewness
 
Basic Concepts of Inferential statistics
Basic Concepts of Inferential statisticsBasic Concepts of Inferential statistics
Basic Concepts of Inferential statistics
 
Chapter 14
Chapter 14 Chapter 14
Chapter 14
 
Regression
RegressionRegression
Regression
 
Multivariate analyses
Multivariate analysesMultivariate analyses
Multivariate analyses
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 

Similaire à Ders 2 ols .ppt

Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
Elkana Rorio
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
saba khan
 
IBM401 Lecture 5
IBM401 Lecture 5IBM401 Lecture 5
IBM401 Lecture 5
saark
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
recepmaz
 

Similaire à Ders 2 ols .ppt (20)

Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.ppt
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Regression
RegressionRegression
Regression
 
IBM401 Lecture 5
IBM401 Lecture 5IBM401 Lecture 5
IBM401 Lecture 5
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSS
 
Multiple Linear Regression
Multiple Linear Regression Multiple Linear Regression
Multiple Linear Regression
 
Regression & correlation
Regression & correlationRegression & correlation
Regression & correlation
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficient
 
Quantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA ProgramQuantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA Program
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 

Plus de Ergin Akalpler

Plus de Ergin Akalpler (20)

ders 3.3 Unit root testing section 3 .pptx
ders 3.3 Unit root testing section 3 .pptxders 3.3 Unit root testing section 3 .pptx
ders 3.3 Unit root testing section 3 .pptx
 
ders 3.2 Unit root testing section 2 .pptx
ders 3.2 Unit root testing section 2 .pptxders 3.2 Unit root testing section 2 .pptx
ders 3.2 Unit root testing section 2 .pptx
 
lesson 3.1 Unit root testing section 1 .pptx
lesson 3.1 Unit root testing section 1 .pptxlesson 3.1 Unit root testing section 1 .pptx
lesson 3.1 Unit root testing section 1 .pptx
 
CH 3.2 Macro8_Aggregate Demand _Aggregate Supply long and run.ppt
CH 3.2 Macro8_Aggregate Demand _Aggregate  Supply long and run.pptCH 3.2 Macro8_Aggregate Demand _Aggregate  Supply long and run.ppt
CH 3.2 Macro8_Aggregate Demand _Aggregate Supply long and run.ppt
 
CH 2.2 Aggregate Demand & Aggregate Supply.ppt
CH 2.2  Aggregate Demand & Aggregate Supply.pptCH 2.2  Aggregate Demand & Aggregate Supply.ppt
CH 2.2 Aggregate Demand & Aggregate Supply.ppt
 
CH 1.4 lesson Macro1_Small_Open_Economy.ppt
CH 1.4 lesson Macro1_Small_Open_Economy.pptCH 1.4 lesson Macro1_Small_Open_Economy.ppt
CH 1.4 lesson Macro1_Small_Open_Economy.ppt
 
CH 1.3 lesson macro2_the Closed Economy .ppt
CH 1.3  lesson macro2_the Closed Economy .pptCH 1.3  lesson macro2_the Closed Economy .ppt
CH 1.3 lesson macro2_the Closed Economy .ppt
 
CH 1.2 marginal propensity to save and MP to consume .ppt
CH 1.2 marginal propensity to save and MP to consume .pptCH 1.2 marginal propensity to save and MP to consume .ppt
CH 1.2 marginal propensity to save and MP to consume .ppt
 
CH 1.1 Aggregate Demand & Aggregate Supply.ppt
CH 1.1 Aggregate Demand & Aggregate Supply.pptCH 1.1 Aggregate Demand & Aggregate Supply.ppt
CH 1.1 Aggregate Demand & Aggregate Supply.ppt
 
ch04.2 arz talep egrileri ve denklemleri.ppt
ch04.2 arz talep egrileri ve denklemleri.pptch04.2 arz talep egrileri ve denklemleri.ppt
ch04.2 arz talep egrileri ve denklemleri.ppt
 
ch04.1 arz talep egrisi mikro s-d theorisi.ppt
ch04.1 arz talep egrisi mikro s-d theorisi.pptch04.1 arz talep egrisi mikro s-d theorisi.ppt
ch04.1 arz talep egrisi mikro s-d theorisi.ppt
 
ch04.1 arz ve talep eğrileri micro s-d theo.ppt
ch04.1 arz  ve talep  eğrileri micro s-d theo.pptch04.1 arz  ve talep  eğrileri micro s-d theo.ppt
ch04.1 arz ve talep eğrileri micro s-d theo.ppt
 
Ders 3.3 David Ricardo , ticaretten kazanımlar.ppt
Ders 3.3 David Ricardo , ticaretten kazanımlar.pptDers 3.3 David Ricardo , ticaretten kazanımlar.ppt
Ders 3.3 David Ricardo , ticaretten kazanımlar.ppt
 
Ders 3.2 David Ricardo ch 3.2 karsılatrımalı üstünlük.pptx
Ders 3.2 David Ricardo ch 3.2 karsılatrımalı üstünlük.pptxDers 3.2 David Ricardo ch 3.2 karsılatrımalı üstünlük.pptx
Ders 3.2 David Ricardo ch 3.2 karsılatrımalı üstünlük.pptx
 
Ders 3 .1 Adam Smith ch 3.1 mutlak avantaj.pptx
Ders 3 .1 Adam Smith  ch 3.1 mutlak avantaj.pptxDers 3 .1 Adam Smith  ch 3.1 mutlak avantaj.pptx
Ders 3 .1 Adam Smith ch 3.1 mutlak avantaj.pptx
 
mikroekonomi ders 2 ekonomist gibi düşünmek ch02 mikro.ppt
mikroekonomi ders 2 ekonomist gibi düşünmek  ch02 mikro.pptmikroekonomi ders 2 ekonomist gibi düşünmek  ch02 mikro.ppt
mikroekonomi ders 2 ekonomist gibi düşünmek ch02 mikro.ppt
 
microeconomics ders 1 ch01-10 prensip.ppt
microeconomics ders 1 ch01-10 prensip.pptmicroeconomics ders 1 ch01-10 prensip.ppt
microeconomics ders 1 ch01-10 prensip.ppt
 
CH 1.4 Macro1_Small_Open_Economy.ppt
CH 1.4 Macro1_Small_Open_Economy.pptCH 1.4 Macro1_Small_Open_Economy.ppt
CH 1.4 Macro1_Small_Open_Economy.ppt
 
CH 1.3 macro2_Closed_Economy.ppt
CH 1.3  macro2_Closed_Economy.pptCH 1.3  macro2_Closed_Economy.ppt
CH 1.3 macro2_Closed_Economy.ppt
 
CH 1.2 mps pmc.ppt
CH 1.2 mps pmc.pptCH 1.2 mps pmc.ppt
CH 1.2 mps pmc.ppt
 

Dernier

Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
dipikadinghjn ( Why You Choose Us? ) Escorts
 

Dernier (20)

Solution Manual for Principles of Corporate Finance 14th Edition by Richard B...
Solution Manual for Principles of Corporate Finance 14th Edition by Richard B...Solution Manual for Principles of Corporate Finance 14th Edition by Richard B...
Solution Manual for Principles of Corporate Finance 14th Edition by Richard B...
 
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
 
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
 
(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7
(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7
(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7
 
Top Rated Pune Call Girls Sinhagad Road ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated  Pune Call Girls Sinhagad Road ⟟ 6297143586 ⟟ Call Me For Genuine S...Top Rated  Pune Call Girls Sinhagad Road ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated Pune Call Girls Sinhagad Road ⟟ 6297143586 ⟟ Call Me For Genuine S...
 
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
 
Indore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfIndore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdf
 
The Economic History of the U.S. Lecture 23.pdf
The Economic History of the U.S. Lecture 23.pdfThe Economic History of the U.S. Lecture 23.pdf
The Economic History of the U.S. Lecture 23.pdf
 
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...
 
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
 
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
 
Top Rated Pune Call Girls Dighi ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Dighi ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Dighi ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Dighi ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
The Economic History of the U.S. Lecture 26.pdf
The Economic History of the U.S. Lecture 26.pdfThe Economic History of the U.S. Lecture 26.pdf
The Economic History of the U.S. Lecture 26.pdf
 
The Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdfThe Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdf
 
Call Girls in New Friends Colony Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escort...
Call Girls in New Friends Colony Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escort...Call Girls in New Friends Colony Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escort...
Call Girls in New Friends Colony Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escort...
 
( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...
 
The Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfThe Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdf
 
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
 
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
 

Ders 2 ols .ppt

  • 1. Regression Analysis MULTIPLE REGRESSION [ CROSS-SECTIONAL DATA ] ASSOC PROF ERGIN AKALPLER
  • 2. Learning Objectives  Explain the linear multiple regression model [for cross- sectional data]  Interpret linear multiple regression computer output  Explain multicollinearity  Describe the types of multiple regression models
  • 3. Regression Modeling Steps  Define problem or question  Specify model  Collect data  Do descriptive data analysis  Estimate unknown parameters  Evaluate model Use model for prediction
  • 4. Y-hat = 0 + 1x1 + 2x2 + ... + PxP +  Simple vs. Multiple   represents the unit change in Y per unit change in X .  Does not take into account any other variable besides single independent variable.  i represents the unit change in Y per unit change in Xi. Multiple variable  Takes into account the effect of other i s.  “Net regression coefficient.”
  • 5. Assumptions Linearity - the Y variable is linearly related to the value of the X variable. Independence of Error - the error (residual) is independent for each value of X. Homoscedasticity - the variation around the line of regression be constant for all values of X. Normality - the values of Y be normally distributed at each value of X.
  • 6. Regression to be performed with hypothesis Ho: γ= 0 Yt becomes non-stationary H1: γ≠ 0 Yt is significant (all variables must become significant at level) And for residuals, Ho: γ= 0 Yt residuals are not serially correlated and not heteroscedastic and normally distributed H1: γ≠ 0 Yt residuals are serially correlated, heteroscedastic and not normally distributed.
  • 7. Best regression model null, Ho= residuals are not serially corelated alt, H1= residuals are serially corelated  R2 value must be high (it should be 60% and more for good model)  No serial correlation (LM test probability value must be higher than 0.05 )  No hetroscedasticity (in the residual p value must be highe than 0.05) Residual are normally distributed (histogram p value must be higher than 0.05)
  • 8. Sample Model with hypo for ols 
  • 9. Goal Develop a statistical model that can predict the values of a dependent (response) variable based upon the values of the independent (explanatory) variables.
  • 10. Simple Regression A statistical model that utilizes one quantitative independent variable “X” to predict the quantitative dependent variable “Y.”
  • 11. Ols interpretation Variables Coefficient St error T stats Prob. C -3188845 1822720 -1.749487 0.0866 Income 0.819235 0.003190 256,7871 0.0000 R2 0.999 Mean dependency 3522.160 Adjusted R 0.9999 St dependent var. 3077,678 SE of Reg 82.86681 Akaike info criterion 11,73539 Sum square resid 337614,8 Schwarz criterion 11.87820 Log likelihood -292,3779 Hanna quin criterion 1176500 F stats 65939.59 Durbin Watson stats 0.568044 Prob 0.0000
  • 12. For ols results  Coefficient signs explains the direction of relation between explanatory and dependent variables  The standard error of a coefficient indicates the accuracy of the estimated ordinary least squares (OLS) coefficient with respect to its population parameter. Each standard error is the square root of the variance of the corresponding coefficient.  t-test is a statistical hypothesis testing technique that is used to test the linearity of the relationship between the response variable and different predictor variables. In other words, it is used to determine whether or not there is a linear correlation between the response and predictor variables. The t-test helps to determine if this linear relationship is statistically significant  it is estimated by dividing the coefficient to the st error  t stats= coefficient/ st error
  • 13. Ols results  Probability value must be between zero and 0.05 to be significant model.  The p-value for each term tests the null hypothesis that the coefficient is equal to zero (no effect).  A low p-value (< 0.05) indicates that you can reject the null hypothesis. In other words, a predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor's value are related to changes in the response variable.  Ho residual are not normally distributed
  • 14. Ols results  R2 value explain how many percentage of Y dependent variable will be explained by explanatory variables X. The effects of independent variables on dependent variable  Adjusted R2 can increase or decrease the independent variable. Too many explanatory variable may cause negative sign. 
  • 15. Ols results  F statistic this statistic tells how jointly significant explanatory variables affect dependent variable The higher the F value the better the model  Probability statistics lower the value the better the model. It tells the statistically significance of the model.
  • 16. Ols results  Mean dependent variable is the average value of the dependent variable  AIC SIC and HQC are used to choose the best model the lower the value the better the model AIC here is the lowest value gives us the best model to adop for model  Durbin Watson stats tell the serial correlation, if the DW is less than two it is the evidence of positive serial correlation and the model is suffering from serial correlation.
  • 17. Multiple Regression A statistical model that utilizes two or more quantitative and qualitative explanatory variables (x1,..., xp) to predict a quantitative dependent variable Y. Caution: have at least two or more quantitative explanatory variables (rule of thumb)
  • 18. Multiple regression  Multiple regression is a statistical technique that can be used to analyze the relationship between a single dependent variable and several independent variables. The objective of multiple regression analysis is to use the independent variables whose values are known to predict the value of the single dependent value.
  • 19. Multiple regression assumptions  Assumption #1: Your dependent variable should be measured on a continuous scale (i.e., it is either an interval or ratio variable). Examples of variables that meet this criterion include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth
  • 20. Multiple regression assumptions  Assumption #2: You have two or more independent variables, which can be either continuous (i.e., an interval or ratio variable) or categorical (i.e., an ordinal or nominal variable). For examples of continuous and ordinal variables, see the bullet above. Examples of nominal variables include gender (e.g., 2 groups: male and female), ethnicity
  • 21. Multiple regression assumptions  Assumption #3: You should have independence of observations (i.e., independence of residuals), which you can easily check using the Durbin-Watson statistic,
  • 22. Multiple regression assumptions  Assumption #4: There needs to be a linear relationship between (a) the dependent variable and each of your independent variables, and (b) the dependent variable and the independent variables collectively. Whilst there are a number of ways to check for these linear relationships,
  • 23. Multiple regression assumptions  Assumption #5: Your data needs to show homoscedasticity, which is where the variances along the line of best fit remain similar as you move along the line. We explain more about what this means and how to assess the homoscedasticity of your data in our enhanced multiple regression guide.
  • 24. Multiple regression assumptions  Assumption #6: Your data must not show multicollinearity (ne realtion between variables), which occurs when you have two or more independent variables that are highly correlated with each other. This leads to problems with understanding which independent variable contributes to the variance explained in the dependent variable, as well as technical issues in calculating a multiple regression model.
  • 25. Multiple regression assumptions  Assumption #7: There should be no significant outliers, high leverage points or highly influential points.
  • 26. Multiple regression assumptions  Finally, you need to check that the residuals (errors) are approximately normally distributed (we explain these terms in our enhanced multiple regression guide). Estimate histogram ;  (p value must be higher than 0.05)
  • 27. Multiple Regression Model more than two variables X2 X1 Y e
  • 28. Hypotheses  H0: 1 = 2 = 3 = ... = P = 0  H1: At least one regression coefficient is not equal to zero
  • 29. Hypotheses (alternate format) H0: i = 0 H1: i  0
  • 30. Types of Models  Positive linear relationship  Negative linear relationship  No relationship between X and Y  Positive curvilinear relationship  U-shaped curvilinear  Negative curvilinear relationship
  • 33. Linear Model Relationship between one dependent & two or more independent variables is a linear function            P P X X X Y  2 2 1 1 0 Dependent (response) variable Independent (explanatory) variables Population slopes Population Y-intercept Random error
  • 34. Method of Least Squares  The straight line that best fits the data.  Determine the straight line for which the differences between the actual values (Y) and the values that would be predicted from the fitted line of regression (Y-hat) are as small as possible.
  • 35. Measures of Variation Explained variation (sum of squares due to regression) Unexplained variation (error sum of squares) Total sum of squares
  • 36. Coefficient of Multiple Determination When null hypothesis is rejected, a relationship between Y and the X variables exists. Strength measured by R2 [ several types ]
  • 37. Coefficient of Multiple Determination R2 y.123- - -P The proportion of Y that is explained by the set of explanatory variables selected
  • 38. Standard Error of the Estimate sy.x the measure of variability around the line of regression
  • 39. Interval Bands [from simple regression] X Y X Yi = b0 + b1 X ^ Xgiven _
  • 40. Multiple Regression Equation Y-hat = 0 + 1x1 + 2x2 + ... + PxP +  where: 0 = y-intercept {a constant value} 1 = slope of Y with variable x1 holding the variables x2, x3, ..., xP effects constant P = slope of Y with variable xP holding all other variables’ effects constant
  • 41. Mini-Case Predict the consumption of home heating oil during January for homes located around Screne Lakes. Two explanatory variables are selected - - average daily atmospheric temperature (oF) and the amount of attic insulation (“).
  • 42. O il (G a l) Te m p Insula tion 275.30 40 3 363.80 27 3 164.30 40 10 40.80 73 6 94.30 64 6 230.90 34 6 366.70 9 6 300.60 8 10 237.80 23 10 121.40 63 3 31.40 65 10 203.50 41 6 441.10 21 3 323.00 38 3 52.50 58 10 Mini-Case (0F) Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.
  • 43. Mini-Case  Oil is dependent and temp is independent  What preliminary conclusions can home owners draw from the data?  What could a home owner expect heating oil consumption (in gallons) to be if the outside temperature is 15 oF when the attic insulation is 10 inches thick?  Model: Oil= tepm+attic insulation +error term
  • 44. Multiple Regression Equation [mini-case] Dependent variable: Gallons Consumed ------------------------------------------------------------------------------------- Standard T Parameter Estimate Error Statistic P-Value -------------------------------------------------------------------------------------- CONSTANT 562.151 21.0931 26.6509 0.0000 Insulation -20.0123 2.34251 -8.54313 0.0000 Temperature -5.43658 0.336216 -16.1699 0.0000 -------------------------------------------------------------------------------------- R-squared = 96.561 percent R-squared (adjusted for d.f.) = 95.9879 percent Standard Error of Est. = 26.0138 +
  • 45. Multiple Regression Equation [mini-case] Y-hat = 562.15 - 5.44x1 - 20.01x2 where: x1 = temperature [degrees F] x2 = attic insulation [inches]
  • 46. Multiple Regression Equation [mini-case] Y-hat = 562.15 - 5.44x1 - 20.01x2 thus:  For a home with zero inches of attic insulation and an outside temperature of 0 oF, 562.15 gallons of heating oil would be consumed. [ caution .. data boundaries .. extrapolation ] +
  • 47. Extrapolation is the process of creating new data point out of a discrete set of known data points Y Interpolation X Extrapolation Extrapolation Relevant Range
  • 48. Multiple Regression Equation [mini-case] Y-hat = 562.15 - 5.44x1 - 20.01x2  For a home with zero attic insulation and an outside temperature of zero, 562.15 gallons of heating oil would be consumed.  [ caution .. data boundaries .. extrapolation ]  For each incremental increase in degree F of temperature, for a given amount of attic insulation, heating oil consumption drops 5.44 gallons. +
  • 49. Multiple Regression Equation [mini-case] Y-hat = 562.15 - 5.44x1 - 20.01x2  For a home with zero attic insulation and an outside temperature of zero, 562 gallons of heating oil would be consumed. [ caution … ]  For each incremental increase in degree F of temperature, for a given amount of attic insulation, heating oil consumption drops 5.44 gallons. For each incremental increase in inches of attic insulation, at a given temperature, heating oil consumption drops 20.01 gallons.
  • 50. Multiple Regression Prediction [mini-case] Y-hat = 562.15 - 5.44x1 - 20.01x2 with x1 = 15oF and x2 = 10 inches Y-hat = 562.15 - 5.44(15) - 20.01(10) = 280.45 gallons consumed
  • 51. Coefficient of Multiple Determination [mini-case] R2 y.12 = .9656 96.56 percent of the variation in heating oil can be explained by the variation in temperature insulation. Is a very high effects of temp and attic on oil consumiton
  • 52. Coefficient of Multiple Determination  Proportion of variation in Y ‘explained’ by all X variables taken together  R2 Y.12 = Explained variation = SSR Total variation SST  sum of squares due to regression (SSR), ∑(Ŷ − Ȳ)2.  SST is the total sum of squares. R-square can take on any value between 0 and 1, with a value closer to 1 indicating that a greater proportion of variance is accounted for by the model.  Never decreases when new X variable is added to model  Only Y values determine SST  Disadvantage when comparing models
  • 53. Coefficient of Multiple Determination Adjusted  Proportion of variation in Y ‘explained’ by all X variables taken together  Reflects  Sample size  Number of independent variables  Smaller [more conservative] than R2 Y.12  Used to compare models
  • 54. Coefficient of Multiple Determination (adjusted) R2 (adj) y.123- - -P The proportion of Y that is explained by the set of independent [explanatory] variables selected, adjusted for the number of independent variables and the sample size.
  • 55. Coefficient of Multiple Determination (adjusted) [Mini-Case] R2 adj = 0.9599 95.99 percent of the variation in heating oil consumption can be explained by the model - adjusted for number of independent variables and the sample size
  • 56. Coefficient of Partial Determination  Proportion of variation in Y ‘explained’ by variable XP holding all others constant  Must estimate separate models  Denoted R2 Y1.2 in two X variables case  Coefficient of partial determination of X1 with Y holding X2 constant  Useful in selecting X variables
  • 57. Coefficient of Partial Determination [p. 878] R2 y1.234 --- P The coefficient of partial variation of variable Y with x1 holding constant the effects of variables x2, x3, x4, ... xP.
  • 58. Testing Overall Significance  Shows if there is a linear relationship between all X variables together & Y  Uses p-value  Hypotheses  H0: 1 = 2 = ... = P = 0 No linear relationship  H1: At least one coefficient is not 0 At least one X variable affects Y
  • 59. Testing Model Portions  Examines the contribution of a set of X variables to the relationship with Y  Null hypothesis:  Variables in set do not improve significantly the model when all other variables are included  Must estimate separate models  Used in selecting X variables
  • 60. Diagnostic Checking with following H0 retain or reject If reject - {p-value  0.05} R2 adj Correlation matrix Partial correlation matrix
  • 61. Multicollinearity  It is the occurrence of high intercorrelations among two or more independent variables in a multiple regression model.
  • 62. Multicolinearity is a problem  Multicollinearity is a problem because it produces regression model results that are less reliable.  This is due to wider confidence intervals (larger standard errors) that  can lower the statistical significance of regression coefficients.
  • 63. Multicollinearity  High correlation between X variables  Coefficients measure combined effect  Leads to unstable coefficients depending on X variables in model  Always exists; matter of degree  Example: Using both total number of rooms and number of bedrooms as explanatory variables in same model (independent variables)
  • 64. Detecting Multicollinearity  Examine correlation matrix  Correlations between pairs of X variables are more than with Y variable  Few solution (remedies)  Obtain new sample data  Eliminate one correlated X variable
  • 65. Evaluating Multiple Regression Model Steps  Examine variation measures  Do residual analysis  Test parameter significance Overall model Portions of model Individual coefficients  Test for multicollinearity
  • 67. Dummy-Variable Regression Model  Involves categorical X variable with two levels e.g., female-male, employed-not employed, etc.
  • 68. Dummy-Variable Regression Model  Involves categorical X variable with two levels  e.g., female-male,  employed-not employed, etc.  Variable levels coded 0 & 1
  • 69. Dummy-Variable Regression Model  Involves categorical X variable with two levels e.g., female-male, employed-not employed, etc.  Variable levels coded 0 & 1  Assumes only intercept is different Slopes are constant across categories
  • 70. Dummy-Variable Model Relationships Y X1 0 0 Same slopes b1 b0 b0 + b2 Females Males
  • 71. Dummy Variables  Permits use of qualitative data (e.g.: seasonal, class standing, location, gender).  0, 1 coding (nominative data)  As part of Diagnostic Checking; incorporate outliers (i.e.: large residuals) and influence measures.
  • 73. Interaction Regression Model  Hypothesizes interaction between pairs of X variables Response to one X variable varies at different levels of another X variable  Contains two-way cross product terms Y = 0 + 1x1 + 2x2 + 3x1x2 +   Can be combined with other models e.g. dummy variable models
  • 74. Effect of Interaction  Given:  Without interaction term, effect of X1 on Y is measured by 1  With interaction term, effect of X1 on Y is measured by 1 + 3X2  Effect increases as X2i increases Y X X X X i i i i i i           0 1 1 2 2 3 1 2
  • 75. Interaction Example X1 4 8 12 0 0 1 0.5 1.5 Y Y = 1 + 2X1 + 3X2 + 4X1X2
  • 76. Interaction Example X1 4 8 12 0 0 1 0.5 1.5 Y Y = 1 + 2X1 + 3X2 + 4X1X2 Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
  • 77. Interaction Example Y X1 4 8 12 0 0 1 0.5 1.5 Y = 1 + 2X1 + 3X2 + 4X1X2 Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1 Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
  • 78. Interaction Example Effect (slope) of X1 on Y does depend on X2 value X1 4 8 12 0 0 1 0.5 1.5 Y Y = 1 + 2X1 + 3X2 + 4X1X2 Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1 Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
  • 80. The Difference between Linear and Nonlinear Regression Models  The Difference between Linear and Nonlinear Regression Models  The difference between linear and nonlinear regression models isn’t as straightforward as it sounds.  You’d think that linear equations produce straight lines and nonlinear equations model curvature. Unfortunately, that’s not correct.
  • 81. Linear Regression Equations  A linear regression model follows a very particular form. In statistics, a regression model is linear when all terms in the model are one of the following:  The constant  A parameter multiplied by an independent variable (IV)  Then, you build the equation by only adding the terms together. These rules limit the form to just one type:
  • 82. Linear regression  Then, you build the equation by only adding the terms together. These rules limit the form to just one type:  Dependent variable = constant + parameter * IV + … + parameter * IV
  • 83. The regression example below models the relationship between body mass index (BMI) and body fat percent. In a different blog post, I use this model to show how to make predictions with regression analysis. It is a linear model that uses a quadratic (squared) term to model the curved relationship.
  • 84. Nonlinear Regression Equations  I showed how linear regression models have one basic configuration.  Now, we’ll focus on the “non” in nonlinear! If a regression equation doesn’t follow the rules for a linear model, then it must be a nonlinear model.  It’s that simple! A nonlinear model is literally not linear.
  • 85. Non linear regression  Consequently, nonlinear regression can fit an enormous variety of curves.  However, because there are so many candidates, you may need to conduct some research to determine which functional form provides the best fit for your data.
  • 86. Non linear regression  Beside, I present a handful of examples that illustrate the diversity of nonlinear regression models. Keep in mind that each function can fit a variety of shapes, and there are many nonlinear functions. Also, notice how nonlinear regression equations are not comprised of only addition and multiplication! In  the table, thetas (dependent) are the parameters, and  Xs are the independent variables.
  • 87. non Linear Models  Non-linear models that can be expressed in linear form Can be estimated by least square in linear form  Require data transformation
  • 89. Logarithmic Transformation Y =  + 1 lnx1 + 2 lnx2 +  Y X1 1 > 0 1 < 0
  • 90. Square-Root Transformation Y X1 Y X X i i i i         0 1 1 2 2 1 > 0 1 < 0
  • 91. Reciprocal Transformation Y X1 1 > 0 1 < 0 i i i i X X Y         2 2 1 1 0 1 1 Asymptote
  • 92. Exponential Transformation Y X1 1 > 0 1 < 0 Y e i X X i i i        0 1 1 2 2
  • 93. End of Regression Analysis / multiregression THANK YOU