11. Predicted Values & Residuals Doing this successively for the remaining observations yields the predicted values ( sometimes referred to as the fitted values or fits ). The first predicted value is obtained by taking the values of the predictor variables x 1 , x 2 ,…, x k for the first sample observation and substituting these values into the estimated regression function.
12. Predicted Values & Residuals The residuals are then the differences between the observed and predicted y values.
13. Sums of Squares The number of degrees of freedom associated with SSResid is n - (k + 1), because k + 1 df are lost in estimating the k + 1 coefficients , 1 , 2 ,…, k . The residual (or error) sum of sqyares, SSResid , and total sum of squares, SSTo , are given by where is the mean of the y observations in the sample.
14. Estimate for 2 An estimate of the random deviation variance 2 is given by and is the estimate of .
15. Coefficient of Multiple Determination, R 2 The coefficient of multiple determination, R 2 , interpreted as the proportion of variation in observed y values that is explained by the fitted model, is
16. Adjusted R 2 Generally, a model with large R 2 and small s e are desirable. If a large number of variables (relative to the number of data points) is used those conditions may be satisfied but the model will be unrealistic and difficult to interpret.
17. Adjusted R 2 Notice that when a large number of variables are used to build the model, this value will be substantially lower than R 2 and give a better indication of usability of the model. To sort out this problem, sometimes computer packages compute a quantity called the adjusted R 2 ,
18. F Distributions F distributions are similar to a Chi-Square distributions, but have two parameters, df den and df num .
19. The F Test for Model Utility The regression sum of squares denoted by SSReg is defined by SSREG = SSTo - SSresid
20. The F Test for Model Utility When all k i ’s are zero in the model y = + 1 x 1 + 2 x 2 + … + k x k + e And when the distribution of e is normal with mean 0 and variance 2 for any particular values of x 1 , x 2 ,…, x k , the statistic has an F probability distribution based on k numerator df and n - (K+ 1) denominator df
21.
22. The F Test for Utility of the Model y = + 1 x 1 + 2 x 2 + … + k x k + e Test statistic: An alternate formula:
23. The F Test Utility of the Model y = + 1 x 1 + 2 x 2 + … + k x k + e The test is upper-tailed, and the information in the Table of Values that capture specified upper-tail F curve areas is used to obtain a bound or bounds on the P-value using numerator df = k and denominator df = n - (k + 1). Assumptions: For any particular combination of predictor variable values, the distribution of e, the random deviation, is normal with mean 0 and constant variance.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35. Example Linear Model with Height, Age & Coded Activity and Gender The histogram of the residuals appears to be consistent with the assumption that the residuals are a sample from a normal distribution.
36. Example Linear Model with Height, Age & Coded Activity and Gender The normality plot also tends to indicate the residuals can reasonably be thought to be a sample from a normal distribution.
37. Example Linear Model with Height, Age & Coded Activity and Gender The residual plot also tends to indicate that the model assumptions are not unreasonable, although there would be some concern that the residuals are predominantly positive for smaller fitted lung capacities.