SlideShare une entreprise Scribd logo
1  sur  40
Quantitative Research Technique
Multiple Regression Analysis
Selection of Predictor Variables
Confidence and Prediction Interval
Dinesh Pudasaini (CRN 071MSI604)
1
Goal
• Develop a statistical model that can predict the values of a
dependent (response) variable based upon the values of
the Independent (explanatory) variables.
• In many situations, more than one independent variable
may be useful in predicting the value of a dependent
variable. We then use multiple regression.
2
Introduction
Simple Regression
A statistical model that utilizes one quantitative independent
variable “X” to predict the quantitative dependent variable
“Y.”
Multiple Regression:
A statistical model that utilizes two or more quantitative and
qualitative explanatory variables (x1,..., xk) to predict a
quantitative dependent variable Y.
3
Simple vs. Multiple
• Simple Regression
•  represents the unit
change in Y per unit
change in X .
• Does not take into account
any other variable besides
single independent
variable.
• Multiple regression
• i represents the unit
change in Y per unit change
in Xi.
• Takes into account the
effect of other i s.
4
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
5
Linear Model
• Relationship between one dependent & two or more
independent variables is a linear function
Dependent
(response)
variable
Independent
(explanatory)
variables
Population
slopes
Population
Y-intercept
Random
error




 




 P
P
X
X
X
Y 
2
2
1
1
0
6
Linear Model
• The error terms Ɛ are mutually independent and identically
distributed, with mean = 0 and constant variances
• This is so, because the observations y1, y2, . . . ,yn are a random
sample, they are mutually independent and hence the error
terms are also mutually independent
• The distribution of the error term is independent of the joint
distribution of x i, x 2, . . . , x k
7
Method of Least Squares
• we use the least-squares method to fit a linear function to the
data.
• bo,b1, b2, b3 . . . , bk are the sample estimates of the
coefficients ß0,ß1, ß2, ß3 . . . , ßk
• The least-squares method chooses the b’s that make the sum of
squares of the residuals as small as possible.
• The least-squares estimates are the values that minimize the
quantity.
8
Standard Error of Estimate and
Coefficient of Multiple Determination
• The observed variability of the responses about
this fitted model is measured by the variance
and the regression standard error of estimate is
Coefficient of Multiple Determination
When null hypothesis is rejected, a relationship between Y and
the X variables exists. Strength measured by R2
9
Coefficient of Multiple Determination.
• Sum of squares due to error
SSE =
• Sum of squares due to regression
SSR =
• Total sum of squares
SST =
• Obviously,
• The ratio SSR/SST represents the proportion of the total variation in
y explained by the regression model.
• This ratio, denoted by R2, is called the coefficient of multiple
determination.
10
Adjusted Coefficient of Multiple
Determination.
• R2 is sensitive to the magnitudes of n and k in small samples.
If k is large relative to n, the model tends to fit the data very
well. In the extreme case, if n = k+1, the model would exactly
fit the data.
• A better goodness of fit measure is the adjusted R2
Adjusted R2= 1 – (n-1/n-k-1) (1-R2)
» 1- SSE/(n-k-1)/SST/(n-1)
11
Hypothesis Tests in Multiple Linear
Regression
• Three types of hypothesis tests can be carried out for multiple
linear regression models:
• First Test for significance of regression: This test checks the
significance of the whole regression model.
• Second Test: This test checks the significance of individual regression
coefficients.
• Third Test: This test can be used to simultaneously check the
significance of a number of regression coefficients.
12
F-test for the overall fit of the model
13
Test for Significance of Regression
14
Significance tests for ẞi
15
ANOVA for Regression
• Analysis of Variance (ANOVA) consists of
calculations that provide information about levels of
variability within a regression model and form a basis
for tests of significance.
16
Example
A TV industry analyst wants to build a statistical model for
predicting the number of subscribers that a cable station can
expect.
Y = Number of cable subscribers (SUSCRIB).
X1 = Advertising rate which the station charges local advertisers for one minute
of prim time space (ADRATE).
X2 = Kilowatt power of the station’s non-cable signal (KILOWATT).
X3 = Number of families living in the station’s area of dominant influence
(ADI), a geographical division of radio and TV audiences (APIPOP).
X4 = Number of competing stations in the ADI (COMPETE).
17
Example (contd….)
18
Multiple Regression Equation
• Based on the partial t-test, the variables signal and compete
are the least significant variables in our model.
• Let’s drop the least significant variables one at a time.
19
Multiple Regression Equation
Y = 562.15 - 5.44x1 - 20.01x2
where: x1 = temperature [degrees F]
x2 = attic insulation [inches]
20
Multiple Regression Equation
• The variable Compete is the next variable to get rid of.
21
Multiple Regression Prediction
• All the variables in the model are statistically significant,
therefore our final model is:
• Final Model
22
Multicollinearity
• High correlation between X variables (Independent variables).
• Coefficients measure combined effect.
• Leads to unstable coefficients depending on X variables in model
• Always exists; matter of degree
• Example: Using both total number of rooms and number of
bedrooms as explanatory variables in same model
• In many non-experimental situations in business,
economics, and the social and biological sciences, the
independent variables tend to be correlated among
themselves.
23
Detecting Multicollinearity
• Examine correlation matrix
– Determines if the Correlations between pairs of X
variables are more than with Y variable
• Few remedies
– Obtain new sample data
– Eliminate one correlated X variable
24
Finding the Best Multiple Regression
Equation
• Use common sense and practical considerations to
include or exclude variables.
• Consider the P-value.
• Consider equations with high values of adjusted R2
and try to include only a few variables.
• For a given number of predictor (x) variables,
select the equation with the largest value of adjusted
R2.
Selection of Predictor Variable
Stepwise regression
26
Statement of problem
• A common problem is that there is a large set of candidate
predictor variables.
• Goal is to choose a small subset from the larger set so that the
resulting regression model is simple, yet have good predictive
ability.
Example: Cement data
• Response y: heat evolved in calories during hardening of cement on a per
gram basis
• Predictor x1: % of tricalcium aluminate
• Predictor x2: % of tricalcium silicate
• Predictor x3: % of tetracalcium alumino ferrite
• Predictor x4: % of dicalcium silicate
27
Two basic methods
of selecting predictors
• Stepwise regression: Enter and remove predictors, in a
stepwise manner, until there is no justifiable reason to enter or
remove more.
• Best subsets regression: Select the subset of predictors that do
the best at meeting some well-defined objective criterion.
28
Stepwise regression: the idea
• Start with no predictors in the “stepwise model.”
• At each step, enter or remove a predictor based on partial F-
tests (that is, the t-tests).
• Stop when no more predictors can be justifiably entered or
removed from the stepwise model.
1. Specify an Alpha-to-Enter (αE = 0.15) significance level.
2. Specify an Alpha-to-Remove (αR = 0.15) significance level.
29
Stepwise regression:
Step #1
1. Fit each of the one-predictor models, that is, regress y on x1,
regress y on x2, … regress y on xp-1.
2. The first predictor put in the stepwise model is the predictor that
has the smallest t-test P-value (below αE = 0.15).
3. If P-value < 0.15, stop.
Step #2
1. Suppose x1 was the “best” one predictor.
2. Fit each of the two-predictor models with x1 in the model, that is,
regress y on (x1, x2), regress y on (x1, x3), …, and y on (x1, xp-1).
3. The second predictor put in stepwise model is the predictor that
has the smallest t-test P-value (below αE = 0.15).
4. If P-value < 0.15, stop.
30
Stepwise regression:
Step #2 (continued)
1. Suppose x2 was the “best” second predictor.
2. Step back and check P-value for β1 = 0. If the P-value for
β1 = 0 has become not significant (above αR = 0.15),
remove x1 from the stepwise model.
Step#3
1. Suppose both x1 and x2 made it into the two-predictor
stepwise model.
2. Fit each of the three-predictor models with x1 and x2 in the
model, that is, regress y on (x1, x2, x3), regress y on (x1, x2,
x4), …, and regress y on (x1, x2, xp-1).
31
Stepwise regression:
Step #3 (continued)
1. The third predictor put in stepwise model is the predictor
that has the smallest t-test P-value (below αE = 0.15).
2. If P-value < 0.15, stop.
3. Step back and check P-values for β1 = 0 and β2 = 0. If either
P-value has become not significant (above αR = 0.15),
remove the predictor from the stepwise model.
Stopping the procedure
The procedure is stopped when adding an additional predictor
does not yield a t-test P-value below αE = 0.15.
32
Prediction and Confidence
Intervals
33
Confidence intervals are intervals constructed about the
predicted value of y, at a given level of x, which are used to
measure the accuracy of the mean response of all the
individuals in the population.
Prediction intervals are intervals constructed about the
predicted value of y that are used to measure the accuracy of a
single individual’s predicted value.
34
35
Confidence Interval:
Prediction Interval:
Example
• Suppose we want to estimate the average weight of an adult
male in a city We draw a random sample of 1,000 men from a
population of 1,000,000 men and weigh them. We find that the
average man in our sample weighs 180 pounds, and the
standard deviation of the sample is 30 pounds. What is the
95% confidence interval.
Solution:
• Identify a sample statistic. Since we are trying to estimate the
mean weight in the population, we choose the mean weight in
our sample (180) as the sample statistic.
• Select a confidence level. We are working with a 95%
confidence level.
36
Example Contd….
• Find the margin of error.
Find standard error.
The standard error (SE) of the mean is:
SE = s / sqrt( n ) = 30 / sqrt(1000) = 30/31.62 = 0.95
Find critical value.
• The critical value is a factor used to compute the margin of
error. To express the critical value as a t score(t*)
Compute alpha (α): α = 1 - (confidence level / 100) = 0.05
– Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.05/2 =
0.975
– Find the degrees of freedom(df): df = n - 1 = 1000 - 1 =
999 37
Example Contd..
– The critical value is the t score having 999 degrees of
freedom and a cumulative probability equal to 0.975. From
the t distribution table, we find that the critical value is
1.96.
• Note: We might also have expressed the critical value as a z
for small sample size.
• Compute margin of error (ME): ME = critical value * standard
error = 1.96 * 0.95 = 1.86
• The range of the confidence interval = sample statistic +
margin of error.
• And the uncertainty is denoted by the confidence level, this
95% confidence interval is 180 + 1.86
38
Questions
• Explain the linear multiple regression model.
• How predictor variable can be selected Using stepwise
Regression Analysis?
• Suppose we want to estimate the average weight of an adult
male in a city We draw a random sample of 1,000 men from a
population of 1,000,000 men and weigh them. We find that the
average man in our sample weighs 180 pounds, and the
standard deviation of the sample is 30 pounds. What is the
95% confidence interval.
39
Thank You
40

Contenu connexe

Similaire à 604_multiplee.ppt

7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
regression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdfregression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdfMuhammadAftab89
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inferenceKemal İnciroğlu
 
Regression vs Neural Net
Regression vs Neural NetRegression vs Neural Net
Regression vs Neural NetRatul Alahy
 
10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm i10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm iSatwik Mohanty
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in RAlichy Sowmya
 
Pa, moderation, mediation (final)
Pa, moderation, mediation (final)Pa, moderation, mediation (final)
Pa, moderation, mediation (final)ahmed-nor
 
Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Stephen Ong
 
simple linear regression - brief introduction
simple linear regression - brief introductionsimple linear regression - brief introduction
simple linear regression - brief introductionedinyoka
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help HelpWithAssignment.com
 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisHARISH Kumar H R
 

Similaire à 604_multiplee.ppt (20)

7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
 
regression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdfregression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdf
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Regression vs Neural Net
Regression vs Neural NetRegression vs Neural Net
Regression vs Neural Net
 
10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm i10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm i
 
Model selection
Model selectionModel selection
Model selection
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
Regression
RegressionRegression
Regression
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Pa, moderation, mediation (final)
Pa, moderation, mediation (final)Pa, moderation, mediation (final)
Pa, moderation, mediation (final)
 
Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02
 
simple linear regression - brief introduction
simple linear regression - brief introductionsimple linear regression - brief introduction
simple linear regression - brief introduction
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
Measures of Variation
Measures of Variation Measures of Variation
Measures of Variation
 
report
reportreport
report
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression Analysis
 

Dernier

Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 

Dernier (20)

Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 

604_multiplee.ppt

  • 1. Quantitative Research Technique Multiple Regression Analysis Selection of Predictor Variables Confidence and Prediction Interval Dinesh Pudasaini (CRN 071MSI604) 1
  • 2. Goal • Develop a statistical model that can predict the values of a dependent (response) variable based upon the values of the Independent (explanatory) variables. • In many situations, more than one independent variable may be useful in predicting the value of a dependent variable. We then use multiple regression. 2
  • 3. Introduction Simple Regression A statistical model that utilizes one quantitative independent variable “X” to predict the quantitative dependent variable “Y.” Multiple Regression: A statistical model that utilizes two or more quantitative and qualitative explanatory variables (x1,..., xk) to predict a quantitative dependent variable Y. 3
  • 4. Simple vs. Multiple • Simple Regression •  represents the unit change in Y per unit change in X . • Does not take into account any other variable besides single independent variable. • Multiple regression • i represents the unit change in Y per unit change in Xi. • Takes into account the effect of other i s. 4
  • 6. Linear Model • Relationship between one dependent & two or more independent variables is a linear function Dependent (response) variable Independent (explanatory) variables Population slopes Population Y-intercept Random error            P P X X X Y  2 2 1 1 0 6
  • 7. Linear Model • The error terms Ɛ are mutually independent and identically distributed, with mean = 0 and constant variances • This is so, because the observations y1, y2, . . . ,yn are a random sample, they are mutually independent and hence the error terms are also mutually independent • The distribution of the error term is independent of the joint distribution of x i, x 2, . . . , x k 7
  • 8. Method of Least Squares • we use the least-squares method to fit a linear function to the data. • bo,b1, b2, b3 . . . , bk are the sample estimates of the coefficients ß0,ß1, ß2, ß3 . . . , ßk • The least-squares method chooses the b’s that make the sum of squares of the residuals as small as possible. • The least-squares estimates are the values that minimize the quantity. 8
  • 9. Standard Error of Estimate and Coefficient of Multiple Determination • The observed variability of the responses about this fitted model is measured by the variance and the regression standard error of estimate is Coefficient of Multiple Determination When null hypothesis is rejected, a relationship between Y and the X variables exists. Strength measured by R2 9
  • 10. Coefficient of Multiple Determination. • Sum of squares due to error SSE = • Sum of squares due to regression SSR = • Total sum of squares SST = • Obviously, • The ratio SSR/SST represents the proportion of the total variation in y explained by the regression model. • This ratio, denoted by R2, is called the coefficient of multiple determination. 10
  • 11. Adjusted Coefficient of Multiple Determination. • R2 is sensitive to the magnitudes of n and k in small samples. If k is large relative to n, the model tends to fit the data very well. In the extreme case, if n = k+1, the model would exactly fit the data. • A better goodness of fit measure is the adjusted R2 Adjusted R2= 1 – (n-1/n-k-1) (1-R2) » 1- SSE/(n-k-1)/SST/(n-1) 11
  • 12. Hypothesis Tests in Multiple Linear Regression • Three types of hypothesis tests can be carried out for multiple linear regression models: • First Test for significance of regression: This test checks the significance of the whole regression model. • Second Test: This test checks the significance of individual regression coefficients. • Third Test: This test can be used to simultaneously check the significance of a number of regression coefficients. 12
  • 13. F-test for the overall fit of the model 13
  • 14. Test for Significance of Regression 14
  • 16. ANOVA for Regression • Analysis of Variance (ANOVA) consists of calculations that provide information about levels of variability within a regression model and form a basis for tests of significance. 16
  • 17. Example A TV industry analyst wants to build a statistical model for predicting the number of subscribers that a cable station can expect. Y = Number of cable subscribers (SUSCRIB). X1 = Advertising rate which the station charges local advertisers for one minute of prim time space (ADRATE). X2 = Kilowatt power of the station’s non-cable signal (KILOWATT). X3 = Number of families living in the station’s area of dominant influence (ADI), a geographical division of radio and TV audiences (APIPOP). X4 = Number of competing stations in the ADI (COMPETE). 17
  • 19. Multiple Regression Equation • Based on the partial t-test, the variables signal and compete are the least significant variables in our model. • Let’s drop the least significant variables one at a time. 19
  • 20. Multiple Regression Equation Y = 562.15 - 5.44x1 - 20.01x2 where: x1 = temperature [degrees F] x2 = attic insulation [inches] 20
  • 21. Multiple Regression Equation • The variable Compete is the next variable to get rid of. 21
  • 22. Multiple Regression Prediction • All the variables in the model are statistically significant, therefore our final model is: • Final Model 22
  • 23. Multicollinearity • High correlation between X variables (Independent variables). • Coefficients measure combined effect. • Leads to unstable coefficients depending on X variables in model • Always exists; matter of degree • Example: Using both total number of rooms and number of bedrooms as explanatory variables in same model • In many non-experimental situations in business, economics, and the social and biological sciences, the independent variables tend to be correlated among themselves. 23
  • 24. Detecting Multicollinearity • Examine correlation matrix – Determines if the Correlations between pairs of X variables are more than with Y variable • Few remedies – Obtain new sample data – Eliminate one correlated X variable 24
  • 25. Finding the Best Multiple Regression Equation • Use common sense and practical considerations to include or exclude variables. • Consider the P-value. • Consider equations with high values of adjusted R2 and try to include only a few variables. • For a given number of predictor (x) variables, select the equation with the largest value of adjusted R2.
  • 26. Selection of Predictor Variable Stepwise regression 26
  • 27. Statement of problem • A common problem is that there is a large set of candidate predictor variables. • Goal is to choose a small subset from the larger set so that the resulting regression model is simple, yet have good predictive ability. Example: Cement data • Response y: heat evolved in calories during hardening of cement on a per gram basis • Predictor x1: % of tricalcium aluminate • Predictor x2: % of tricalcium silicate • Predictor x3: % of tetracalcium alumino ferrite • Predictor x4: % of dicalcium silicate 27
  • 28. Two basic methods of selecting predictors • Stepwise regression: Enter and remove predictors, in a stepwise manner, until there is no justifiable reason to enter or remove more. • Best subsets regression: Select the subset of predictors that do the best at meeting some well-defined objective criterion. 28
  • 29. Stepwise regression: the idea • Start with no predictors in the “stepwise model.” • At each step, enter or remove a predictor based on partial F- tests (that is, the t-tests). • Stop when no more predictors can be justifiably entered or removed from the stepwise model. 1. Specify an Alpha-to-Enter (αE = 0.15) significance level. 2. Specify an Alpha-to-Remove (αR = 0.15) significance level. 29
  • 30. Stepwise regression: Step #1 1. Fit each of the one-predictor models, that is, regress y on x1, regress y on x2, … regress y on xp-1. 2. The first predictor put in the stepwise model is the predictor that has the smallest t-test P-value (below αE = 0.15). 3. If P-value < 0.15, stop. Step #2 1. Suppose x1 was the “best” one predictor. 2. Fit each of the two-predictor models with x1 in the model, that is, regress y on (x1, x2), regress y on (x1, x3), …, and y on (x1, xp-1). 3. The second predictor put in stepwise model is the predictor that has the smallest t-test P-value (below αE = 0.15). 4. If P-value < 0.15, stop. 30
  • 31. Stepwise regression: Step #2 (continued) 1. Suppose x2 was the “best” second predictor. 2. Step back and check P-value for β1 = 0. If the P-value for β1 = 0 has become not significant (above αR = 0.15), remove x1 from the stepwise model. Step#3 1. Suppose both x1 and x2 made it into the two-predictor stepwise model. 2. Fit each of the three-predictor models with x1 and x2 in the model, that is, regress y on (x1, x2, x3), regress y on (x1, x2, x4), …, and regress y on (x1, x2, xp-1). 31
  • 32. Stepwise regression: Step #3 (continued) 1. The third predictor put in stepwise model is the predictor that has the smallest t-test P-value (below αE = 0.15). 2. If P-value < 0.15, stop. 3. Step back and check P-values for β1 = 0 and β2 = 0. If either P-value has become not significant (above αR = 0.15), remove the predictor from the stepwise model. Stopping the procedure The procedure is stopped when adding an additional predictor does not yield a t-test P-value below αE = 0.15. 32
  • 34. Confidence intervals are intervals constructed about the predicted value of y, at a given level of x, which are used to measure the accuracy of the mean response of all the individuals in the population. Prediction intervals are intervals constructed about the predicted value of y that are used to measure the accuracy of a single individual’s predicted value. 34
  • 36. Example • Suppose we want to estimate the average weight of an adult male in a city We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval. Solution: • Identify a sample statistic. Since we are trying to estimate the mean weight in the population, we choose the mean weight in our sample (180) as the sample statistic. • Select a confidence level. We are working with a 95% confidence level. 36
  • 37. Example Contd…. • Find the margin of error. Find standard error. The standard error (SE) of the mean is: SE = s / sqrt( n ) = 30 / sqrt(1000) = 30/31.62 = 0.95 Find critical value. • The critical value is a factor used to compute the margin of error. To express the critical value as a t score(t*) Compute alpha (α): α = 1 - (confidence level / 100) = 0.05 – Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.05/2 = 0.975 – Find the degrees of freedom(df): df = n - 1 = 1000 - 1 = 999 37
  • 38. Example Contd.. – The critical value is the t score having 999 degrees of freedom and a cumulative probability equal to 0.975. From the t distribution table, we find that the critical value is 1.96. • Note: We might also have expressed the critical value as a z for small sample size. • Compute margin of error (ME): ME = critical value * standard error = 1.96 * 0.95 = 1.86 • The range of the confidence interval = sample statistic + margin of error. • And the uncertainty is denoted by the confidence level, this 95% confidence interval is 180 + 1.86 38
  • 39. Questions • Explain the linear multiple regression model. • How predictor variable can be selected Using stepwise Regression Analysis? • Suppose we want to estimate the average weight of an adult male in a city We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval. 39