Simple lin regress_inference

Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines

Introduction ,[object Object],[object Object],[object Object],[object Object]

The Model House size House Cost Most lots sell for $25,000 Building a house costs about $75 per square foot. House cost = 25000 + 75(Size) The model has a deterministic and a probabilistic components

The Model House cost = 25000 + 75(Size) House size House Cost Most lots sell for $25,000   However, house cost vary even among same size houses! Since cost behave unpredictably, we add a random component.

The Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],x y  0 Run Rise   = Rise/Run  0 and  1 are unknown population parameters, therefore are estimated from the data.

Estimating the Coefficients ,[object Object],[object Object],[object Object],[object Object],          Question: What should be considered a good line? x y

The Least Squares (Regression) Line A good line is one that minimizes the sum of squared differences between the points and the line.

The Least Squares (Regression) Line 3 3     4 4 (1,2) 2 2 (2,4) (3,1.5) Sum of squared differences = (2 - 1) 2 + (4 - 2) 2 + (1.5 - 3) 2 + (4,3.2) (3.2 - 4) 2 = 6.89 2.5 Let us compare two lines The second line is horizontal The smaller the sum of squared differences the better the fit of the line to the data. 1 1 Sum of squared differences = (2 -2.5) 2 + (4 - 2.5) 2 + (1.5 - 2.5) 2 + (3.2 - 2.5) 2 = 3.99

The Estimated Coefficients To calculate the estimates of the slope and intercept of the least squares line , use the formulas: The regression equation that estimates the equation of the first order linear model is: Alternate formula for the slope b 1

[object Object],[object Object],[object Object],[object Object],Independent variable x Dependent variable y The Simple Linear Regression Line

The Simple Linear Regression Line ,[object Object],[object Object],where n = 100.

The Simple Linear Regression Line ,[object Object],[object Object],1. Scatterplot 2. Trend function 3. Tools > Data Analysis > Regression

The Simple Linear Regression Line

Interpreting the Linear Regression -Equation This is the slope of the line. For each additional mile on the odometer, the price decreases by an average of $0.0623 The intercept is b 0 = $17067. 0 No data Do not interpret the intercept as the “ Price of cars that have not been driven” 17067

Error Variable: Required Conditions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

The Normality of  From the first three assumptions we have: y is normally distributed with mean E(y) =  0 +  1 x, and a constant standard deviation     x 1 x 2 x 3     The standard deviation remains constant, but the mean value changes with x  0 +  1 x 1  0 +  1 x 2  0 +  1 x 3 E(y|x 2 ) E(y|x 3 ) E(y|x 1 )

Assessing the Model ,[object Object],[object Object],[object Object]

[object Object],[object Object],Sum of Squares for Errors ,[object Object]

[object Object],[object Object],[object Object],[object Object],Standard Error of Estimate

[object Object],[object Object],[object Object],Standard Error of Estimate, Example It is hard to assess the model based on s  even when compared with the mean value of y.

Testing the slope ,[object Object],  Different inputs (x) yield different outputs (y). No linear relationship. Different inputs (x) yield the same output (y). The slope is not equal to zero The slope is equal to zero Linear relationship. Linear relationship. Linear relationship. Linear relationship.                                                                                                              

[object Object],[object Object],[object Object],[object Object],[object Object],Testing the Slope where The standard error of b 1 .

[object Object],[object Object],Testing the Slope, Example

[object Object],[object Object],[object Object],Testing the Slope, Example

Testing the Slope, Example ,[object Object],There is overwhelming evidence to infer that the odometer reading affects the auction selling price.

[object Object],Coefficient of determination Note that the coefficient of determination is r 2

Coefficient of determination ,[object Object],Overall variability in y The regression model The error Remains, in part, unexplained Explained in part by

Coefficient of determination x 1 x 2 y 1 y 2 Two data points (x 1 ,y 1 ) and (x 2 ,y 2 ) of a certain sample are shown. Total variation in y = Variation explained by the regression line + Unexplained variation (error) Variation in y = SSR + SSE y

Coefficient of determination ,[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],Coefficient of determination, Example

Coefficient of determination ,[object Object],65% of the variation in the auction selling price is explained by the variation in odometer reading. The rest (35%) remains unexplained by this model.

Using the Regression Equation ,[object Object],[object Object],[object Object],[object Object],[object Object]

Point Prediction ,[object Object],[object Object],[object Object],[object Object],A point prediction

Interval Estimates ,[object Object],[object Object],[object Object],[object Object],[object Object]

Interval Estimates, Example ,[object Object],[object Object],[object Object],[object Object],[object Object]

Interval Estimates, Example ,[object Object],[object Object],t .025,98 Approximately

[object Object],[object Object],[object Object],Interval Estimates, Example

[object Object],The effect of the given x g on the length of the interval

Regression Diagnostics - I ,[object Object],[object Object],[object Object],[object Object],[object Object]

Residual Analysis ,[object Object],[object Object],[object Object],[object Object],[object Object]

For each residual we calculate the standard deviation as follows: A Partial list of Standard residuals Residual Analysis Standardized residual ‘i’ = Residual ‘i’ Standard deviation

It seems the residual are normally distributed with mean zero Residual Analysis

Heteroscedasticity ,[object Object],[object Object],+ + + + + + + + + + + + + + + + + + + + + + + + y ^ Residual The spread increases with y ^ ^ y + + + + + + + + + + + + + + + + + + + + + + +

Homoscedasticity ,[object Object],[object Object]

Non Independence of Error Variables ,[object Object],[object Object],[object Object],[object Object]

Patterns in the appearance of the residuals over time indicates that autocorrelation exists. + + + + + + + + + + + + + + + + + + + + + + + + + Time Residual Residual Time + + + Note the runs of positive residuals, replaced by runs of negative residuals Note the oscillating behavior of the residuals around zero. 0 0 Non Independence of Error Variables

Outliers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

+ + + + + + + + + + + + + + + + The outlier causes a shift in the regression line … but, some outliers may be very influential An outlier An influential observation + + + + + + + + + + +

Procedure for Regression Diagnostics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Simple lin regress_inference

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Simple lin regress_inference

Similaire à Simple lin regress_inference (20)

Dernier

Dernier (20)

Simple lin regress_inference