Chapter14

Chapter 14 Multiple Regression Models

[object Object],[object Object],Multiple Regression Models

[object Object],[object Object],Multiple Regression Models (mean y value for fixed x 1 , x 2 ,…, x k values) =  +  1 x 1 +  2 x 2 + … +  k x k

[object Object],Multiple Regression Models The deterministic portion  +  1 x 1 +  2 x 2 + … +  k x k is called the population regression function .

[object Object],[object Object],[object Object],[object Object],[object Object],Polynomial Regression Models

[object Object],[object Object],[object Object],[object Object],Polynomial Regression Models

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Qualitative Predictor Variables.

[object Object],[object Object],[object Object],Qualitative Predictor Variables.

[object Object],[object Object],Least Square Estimates The least squares estimates of  ,  1 ,  2 ,…,  k are those values of a, b 1 , b 2 , … , b k that make this sum of squared deviations as small as possible.

Predicted Values & Residuals Doing this successively for the remaining observations yields the predicted values ( sometimes referred to as the fitted values or fits ). The first predicted value is obtained by taking the values of the predictor variables x 1 , x 2 ,…, x k for the first sample observation and substituting these values into the estimated regression function.

Predicted Values & Residuals The residuals are then the differences between the observed and predicted y values.

Sums of Squares The number of degrees of freedom associated with SSResid is n - (k + 1), because k + 1 df are lost in estimating the k + 1 coefficients  ,  1 ,  2 ,…,  k . The residual (or error) sum of sqyares, SSResid , and total sum of squares, SSTo , are given by where is the mean of the y observations in the sample.

Estimate for  2 An estimate of the random deviation variance  2 is given by and is the estimate of  .

Coefficient of Multiple Determination, R 2 The coefficient of multiple determination, R 2 , interpreted as the proportion of variation in observed y values that is explained by the fitted model, is

Adjusted R 2 Generally, a model with large R 2 and small s e are desirable. If a large number of variables (relative to the number of data points) is used those conditions may be satisfied but the model will be unrealistic and difficult to interpret.

Adjusted R 2 Notice that when a large number of variables are used to build the model, this value will be substantially lower than R 2 and give a better indication of usability of the model. To sort out this problem, sometimes computer packages compute a quantity called the adjusted R 2 ,

F Distributions F distributions are similar to a Chi-Square distributions, but have two parameters, df den and df num .

The F Test for Model Utility The regression sum of squares denoted by SSReg is defined by SSREG = SSTo - SSresid

The F Test for Model Utility When all k  i ’s are zero in the model y =  +  1 x 1 +  2 x 2 + … +  k x k + e And when the distribution of e is normal with mean 0 and variance  2 for any particular values of x 1 , x 2 ,…, x k , the statistic has an F probability distribution based on k numerator df and n - (K+ 1) denominator df

The F Test for Utility of the Model y =  +  1 x 1 +  2 x 2 + … +  k x k + e ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

The F Test for Utility of the Model y =  +  1 x 1 +  2 x 2 + … +  k x k + e Test statistic: An alternate formula:

The F Test Utility of the Model y =  +  1 x 1 +  2 x 2 + … +  k x k + e The test is upper-tailed, and the information in the Table of Values that capture specified upper-tail F curve areas is used to obtain a bound or bounds on the P-value using numerator df = k and denominator df = n - (k + 1). Assumptions: For any particular combination of predictor variable values, the distribution of e, the random deviation, is normal with mean 0 and constant variance.

Example ,[object Object],They attempted to create a model to explain lung capacity in terms of a number of variables. Specifically, Numerical variables: height, age, weight, waist Categorical variables: gender, activity level and smoking status.

Example ,[object Object],[object Object],[object Object]

Example Linear Model with All Numerical Variables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Example ,[object Object],[object Object]

Example Linear Model with variables: Height & Age ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Notice that even though the R 2 value decreases slightly, the adjusted R 2 value actually increases. Also note that the coefficient on Age is now significant at 5%.

Example ,[object Object],[object Object],[object Object],[object Object],[object Object]

Example Linear Model with categorical variables added ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Example ,[object Object],This, the group felt, was confounding the study. In an attempt to determine a reasonable optimal subgroup of the variables to keep in the study, it was noted that a number of the variables were highly related. Since the study was small, a stepwise regression was run and the variables, Height, Age, Coded Activity, Coded Gender were kept and the following model was obtained.

Example Linear Model with Height, Age & Coded Activity and Gender ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Example Linear Model with Height, Age & Coded Activity and Gender ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The rest of the Minitab output is given below.

Example Linear Model with Height, Age & Coded Activity and Gender ,[object Object],[object Object],Minitab identified 3 outliers (because the standardized residuals were unusually large. Various plots of the standardized residuals are produced on the next few slides with comments

Example Linear Model with Height, Age & Coded Activity and Gender The histogram of the residuals appears to be consistent with the assumption that the residuals are a sample from a normal distribution.

Example Linear Model with Height, Age & Coded Activity and Gender The normality plot also tends to indicate the residuals can reasonably be thought to be a sample from a normal distribution.

Example Linear Model with Height, Age & Coded Activity and Gender The residual plot also tends to indicate that the model assumptions are not unreasonable, although there would be some concern that the residuals are predominantly positive for smaller fitted lung capacities.

Chapter14

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Chapter14

Similaire à Chapter14 (20)

Plus de rwmiller

Plus de rwmiller (11)

Dernier

Dernier (20)

Chapter14