1. 405 ECONOMETRICS
Chapter # 3: TWO-VARIABLE REGRESSION
MODEL: THE PROBLEM OF ESTIMATION
By Domodar N. Gujarati
Prof. M. El-SakkaProf. M. El-Sakka
Dept of Economics Kuwait UniversityDept of Economics Kuwait University
2. THE METHOD OF ORDINARY LEAST SQUARES
• To understand this method, we first explain theTo understand this method, we first explain the least squares principleleast squares principle..
• Recall the two-variable PRF:Recall the two-variable PRF:
YYii == ββ11 + β+ β22XXii + u+ uii (2.4.2)(2.4.2)
• The PRF is not directly observable. We estimate it from the SRF:The PRF is not directly observable. We estimate it from the SRF:
YYii == βˆβˆ11 + βˆ+ βˆ22XXii +uˆ+uˆii (2.6.2)(2.6.2)
== YˆYˆii +uˆ+uˆii (2.6.3)(2.6.3)
• wherewhere YˆYˆii is the estimated (conditional mean) value of Yis the estimated (conditional mean) value of Yii ..
• But how is the SRF itself determined?But how is the SRF itself determined? First, express (2.6.3) asFirst, express (2.6.3) as
uˆuˆii = Y= Yii − Yˆ− Yˆii
== YYii −− βˆβˆ11 − βˆ− βˆ22XXii (3.1.1)(3.1.1)
• Now givenNow given n pairs of observations on Y and X, we would like to determinen pairs of observations on Y and X, we would like to determine thethe
SRFSRF in such a manner thatin such a manner that it is as close as possible to the actualit is as close as possible to the actual YY. To. To thisthis
end, we may adopt the following criterion:end, we may adopt the following criterion:
• Choose the SRF in such a way that the sum of the residuals ˆChoose the SRF in such a way that the sum of the residuals ˆuuii = (Y= (Yii − Yˆ− Yˆii)) isis
as small as possible.as small as possible.
3. • But this is not a very good criterionBut this is not a very good criterion. If we adopt the criterion of. If we adopt the criterion of
minimizingminimizing ˆˆuuii ,, Figure 3.1 shows that theFigure 3.1 shows that the residualsresiduals ˆˆuu22 andand ˆuˆu33 as well asas well as
the residualsthe residuals ˆuˆu11 andand ˆuˆu44 receive the same weightreceive the same weight in the sumin the sum (ˆ(ˆuu11 + ˆu+ ˆu22 + ˆu+ ˆu33
+ ˆu+ ˆu44)). A consequence of this is that it is quite possible that the algebraic. A consequence of this is that it is quite possible that the algebraic
sum of thesum of the ˆˆuuii is small (is small (even zeroeven zero) although the) although the ˆˆuuii are widely scatteredare widely scattered
about the SRF.about the SRF.
• To see this, let ˆuTo see this, let ˆu11, ˆu, ˆu22, ˆu, ˆu33, and ˆu, and ˆu44 inin Figure 3.1 take the values of 10, −2,Figure 3.1 take the values of 10, −2,
+2, and −10, respectively. The algebraic sum of these residuals is zero+2, and −10, respectively. The algebraic sum of these residuals is zero
althoughalthough ˆˆuu11 andand ˆuˆu44 are scattered moreare scattered more widely around the SRF thanwidely around the SRF than ˆˆuu22
andand ˆuˆu33..
• We can avoid this problem if weWe can avoid this problem if we adopt theadopt the least-squares criterionleast-squares criterion, which, which
states that the SRF can be fixed instates that the SRF can be fixed in such a way thatsuch a way that
ˆˆuu22
ii = (Y= (Yii − Yˆ− Yˆii))22
= (= (YYii −− βˆβˆ11 − βˆ− βˆ22XXii))22
(3.1.2)(3.1.2)
• is as small as possibleis as small as possible, where, where ˆuˆu22
ii are the squared residuals.are the squared residuals.
5. • By squaringBy squaring ˆuˆuii ,, this method gives more weight to residuals such asthis method gives more weight to residuals such as
ˆˆuu11 andand ˆuˆu44 in Figure 3.1in Figure 3.1 than the residualsthan the residuals ˆuˆu22 andand ˆuˆu33..
• A further justificationA further justification for the least-squares method lies in the factfor the least-squares method lies in the fact
that thethat the estimators obtained by it have some very desirable statisticalestimators obtained by it have some very desirable statistical
propertiesproperties, as we shall see shortly, as we shall see shortly..
6. • It is obvious from (3.1.2) that:It is obvious from (3.1.2) that:
ˆˆuu22
ii == f (βˆf (βˆ11, βˆ, βˆ22)) (3.1.3)(3.1.3)
• that is, the sum of the squared residuals is some function of the estimatorsthat is, the sum of the squared residuals is some function of the estimators
βˆβˆ11 and βˆand βˆ22. To see this. To see this, consider Table 3.1 and conduct two experiments., consider Table 3.1 and conduct two experiments.
7. • Since theSince the βˆβˆ values in thevalues in the two experiments are different, we gettwo experiments are different, we get
different values for the estimated residualsdifferent values for the estimated residuals..
• Now which sets ofNow which sets of βˆβˆ values should we choose? Obviously thevalues should we choose? Obviously the βˆ’sβˆ’s ofof
thethe first experiment are the “best” values. But we can make endlessfirst experiment are the “best” values. But we can make endless
experiments and then choosing that set ofexperiments and then choosing that set of βˆβˆ values that gives us thevalues that gives us the
least possible value ofleast possible value of ˆˆuu22
ii
• But since time, and patience, areBut since time, and patience, are generally in short supply, we need togenerally in short supply, we need to
consider some shortcuts to this trial-and-error process.consider some shortcuts to this trial-and-error process. FortunatelyFortunately,,
the method of least squares provides us with unique estimates ofthe method of least squares provides us with unique estimates of ββ11
andand ββ22 thatthat give the smallest possiblegive the smallest possible value of ˆvalue of ˆuu22
ii..
9. • The process of differentiation yields the following equations for estimatingThe process of differentiation yields the following equations for estimating
ββ11 andand ββ22::
YYii XXii == βˆβˆ11XXii ++ βˆβˆ22XX22
ii (3.1.4)(3.1.4)
YYii = n= nβˆβˆ11 + βˆ+ βˆ22XXii (3.1.5)(3.1.5)
• wherewhere nn is the sample size. These simultaneous equations are known as theis the sample size. These simultaneous equations are known as the
normal equationsnormal equations. Solving the normal equations simultaneously, we obtain:. Solving the normal equations simultaneously, we obtain:
10. • wherewhere X¯ and Y¯ are the sample means of X and Y and where weX¯ and Y¯ are the sample means of X and Y and where we
definedefine xxii = (X= (Xii − X¯ )− X¯ ) andand yyii = (Y= (Yii − Y¯)− Y¯).. Henceforth we adopt theHenceforth we adopt the
convention of letting theconvention of letting the lowercase letters denote deviations from meanlowercase letters denote deviations from mean
valuesvalues..
11. • The last step in (3.1.7) can be obtained directly from (3.1.4) by simpleThe last step in (3.1.7) can be obtained directly from (3.1.4) by simple
algebraic manipulations. Incidentally, note that, by making use of simplealgebraic manipulations. Incidentally, note that, by making use of simple
algebraic identities, formula (3.1.6) for estimatingalgebraic identities, formula (3.1.6) for estimating ββ22 can be alternativelycan be alternatively
expressed as:expressed as:
• The estimators obtained previously are known as theThe estimators obtained previously are known as the least-squaresleast-squares
estimatorsestimators..
12. • Note the followingNote the following numerical properties of estimatorsnumerical properties of estimators obtained by theobtained by the
method of OLS:method of OLS:
• I. The OLS estimators are expressed solely in terms of the observable (i.e.,I. The OLS estimators are expressed solely in terms of the observable (i.e.,
sample) quantities (i.e.,sample) quantities (i.e., X and Y).X and Y). Therefore, they can be easily computedTherefore, they can be easily computed..
• II.II. They are point estimatorsThey are point estimators; that is, given the sample, each estimator will; that is, given the sample, each estimator will
provide only a single (provide only a single (point,point, not intervalnot interval) value of the relevant population) value of the relevant population
parameter.parameter.
• III. Once the OLS estimates are obtained from the sample data, the sampleIII. Once the OLS estimates are obtained from the sample data, the sample
regression line (Figure 3.1) can be easily obtainedregression line (Figure 3.1) can be easily obtained..
• The regression lineThe regression line thus obtained has the followingthus obtained has the following propertiesproperties::
– 1. It passes through the sample means1. It passes through the sample means ofof Y and X. This fact is obviousY and X. This fact is obvious
from (3.1.7), for the latter can be written asfrom (3.1.7), for the latter can be written as Y¯ = βˆY¯ = βˆ11 + βˆ+ βˆ22X¯X¯ , which is, which is
shown diagrammatically in Figure 3.2.shown diagrammatically in Figure 3.2.
14. – 2. The mean value of the estimated2. The mean value of the estimated Y = YˆY = Yˆii is equal to the mean value ofis equal to the mean value of
the actualthe actual YY for:for:
YˆYˆii == βˆβˆ11 + βˆ+ βˆ22XXii
= (= (Y¯ − βˆY¯ − βˆ22X¯ ) + βˆX¯ ) + βˆ22XiXi
== Y¯ +Y¯ + βˆβˆ22((XXii − X¯)− X¯) (3.1.9)(3.1.9)
• Summing both sides of this last equality over the sample values andSumming both sides of this last equality over the sample values and
dividing through by the sample sizedividing through by the sample size nn givesgives
Y¯ˆ = Y¯Y¯ˆ = Y¯ (3.1.10)(3.1.10)
• where use is made of the fact thatwhere use is made of the fact that ((XXii − X¯ ) = 0− X¯ ) = 0..
– 3. The mean value of the residuals3. The mean value of the residuals ¯¯ ˆˆuuii is zerois zero.. From Appendix 3A,From Appendix 3A,
Section 3A.1, the first equation is:Section 3A.1, the first equation is:
−−2(2(YYii −− βˆβˆ11 − βˆ− βˆ22XXii) = 0) = 0
• But sinceBut since uˆuˆii = Y= Yii − βˆ− βˆ11 − βˆ− βˆ22XXii , the preceding equation reduces to, the preceding equation reduces to
−−2 ˆ2 ˆuuii = 0, whence ¯ˆu = 0= 0, whence ¯ˆu = 0
15. • As a result of the preceding property, the sample regressionAs a result of the preceding property, the sample regression
YYii == βˆβˆ11 + βˆ+ βˆ22XXii +uˆ+uˆii (2.6.2)(2.6.2)
• can be expressed in an alternative form where bothcan be expressed in an alternative form where both Y and X are expressedY and X are expressed asas
deviationsdeviations from their mean values. To see this, sum (2.6.2) on both sides tofrom their mean values. To see this, sum (2.6.2) on both sides to
give:give:
YYii = n= nβˆβˆ11 + βˆ+ βˆ22XXii +uˆ+uˆii
== nnβˆβˆ11 + βˆ+ βˆ22XXii since uˆsince uˆii = 0= 0 (3.1.11)(3.1.11)
• Dividing Eq. (3.1.11) through byDividing Eq. (3.1.11) through by n, we obtainn, we obtain
Y¯ = βˆY¯ = βˆ11 + βˆ+ βˆ22X¯X¯ (3.1.12)(3.1.12)
• which is the same as (3.1.7). Subtracting Eq. (3.1.12) from (2.6.2), we obtainwhich is the same as (3.1.7). Subtracting Eq. (3.1.12) from (2.6.2), we obtain
YYii − Y¯ = βˆ− Y¯ = βˆ22(X(Xii − X¯ ) + uˆ− X¯ ) + uˆii
• OrOr
yyii == βˆβˆ22xxii +uˆ+uˆii (3.1.13)(3.1.13)
16. • Equation (3.1.13) is known as theEquation (3.1.13) is known as the deviation formdeviation form. Notice that the. Notice that the
intercept termintercept term βˆβˆ11 is no longer present in it. But the intercept termis no longer present in it. But the intercept term cancan
always be estimated by (3.1.7), that is, from the fact that the samplealways be estimated by (3.1.7), that is, from the fact that the sample
regression line passes through the sample means ofregression line passes through the sample means of Y and X.Y and X.
• An advantage of the deviation form is thatAn advantage of the deviation form is that it often simplifiesit often simplifies
computingcomputing formulas. In passing, note that in the deviation form, theformulas. In passing, note that in the deviation form, the
SRF can be written as:SRF can be written as:
yˆyˆii == βˆβˆ22xxii (3.1.14)(3.1.14)
• whereas in the original units of measurement it waswhereas in the original units of measurement it was YˆYˆii = βˆ= βˆ11 + βˆ+ βˆ22XXii ,,
as shown in (2.6.1).as shown in (2.6.1).
17. – 4. The residuals ˆ4. The residuals ˆuuii are uncorrelated with the predicted Yare uncorrelated with the predicted Yii . This statement. This statement
can be verified as follows: using the deviation form, we can write:can be verified as follows: using the deviation form, we can write:
– where use is made of the fact thatwhere use is made of the fact that
– 5.5. The residuals ˆThe residuals ˆuuii are uncorrelated with Xare uncorrelated with Xii ; that is, This; that is, This
fact follows from Eq. (2) in Appendix 3A, Section 3A.1.fact follows from Eq. (2) in Appendix 3A, Section 3A.1.
18. THE CLASSICAL LINEAR REGRESSION MODEL: THE ASSUMPTIONS
UNDERLYING THE METHOD OF LEAST SQUARES
• In regression analysis our objective is not only to obtainIn regression analysis our objective is not only to obtain βˆβˆ11 andand βˆβˆ22
but also to drawbut also to draw inferencesinferences about the trueabout the true ββ11 andand ββ22. For example, we. For example, we
would like to knowwould like to know howhow closeclose βˆβˆ11 and βˆand βˆ22 are to their counterparts in theare to their counterparts in the
population or how closepopulation or how close YˆYˆii isis to the trueto the true E(Y | XE(Y | Xii).).
• Look at the PRF:Look at the PRF: YYii = β= β11 + β+ β22XXii + u+ uii . It shows that Y. It shows that Yii depends ondepends on bothboth XXii
and uand uii . The assumptions. The assumptions made about themade about the XXii variable(s) and the errorvariable(s) and the error
term areterm are extremely criticalextremely critical to theto the valid interpretation of the regressionvalid interpretation of the regression
estimates.estimates.
• The Gaussian, standard, orThe Gaussian, standard, or classical linear regression modelclassical linear regression model
((CLRMCLRM),), makes 10 assumptionsmakes 10 assumptions..
19. • Keep in mind that the regressandKeep in mind that the regressand Y andY and the regressorthe regressor X themselves may beX themselves may be
nonlinear.nonlinear.
20. • look at Table 2.1. Keeping the value of incomelook at Table 2.1. Keeping the value of income X fixed, say, atX fixed, say, at $80, we$80, we
draw at random a family and observe its weekly family consumptiondraw at random a family and observe its weekly family consumption
expenditureexpenditure Y as, say, $60. Still keeping X at $80, we draw at randomY as, say, $60. Still keeping X at $80, we draw at random
another family and observe itsanother family and observe its Y value as $75. In each of theseY value as $75. In each of these
drawingsdrawings (i.e., repeated sampling), the value of(i.e., repeated sampling), the value of X is fixed at $80. WeX is fixed at $80. We
can repeat thiscan repeat this process for all theprocess for all the X values shown in Table 2.1.X values shown in Table 2.1.
• This means that our regression analysis isThis means that our regression analysis is conditional regressionconditional regression
analysisanalysis, that is, conditional on the given values of the regressor(s), that is, conditional on the given values of the regressor(s) X.X.
22. • As shownAs shown in Figure 3.3in Figure 3.3, each, each Y population corresponding to a given XY population corresponding to a given X
is distributed around its mean value with someis distributed around its mean value with some Y values above theY values above the
mean and some below it.mean and some below it. the mean value of these deviationsthe mean value of these deviations
corresponding to any givencorresponding to any given X should be zero.X should be zero.
• Note that the assumptionNote that the assumption E(uE(uii | X| Xii) = 0) = 0 implies thatimplies that E(YE(Yii | X| Xii) = β) = β11 + β+ β22XXii..
24. • Technically,Technically, (3.2.2) represents the assumption of(3.2.2) represents the assumption of homoscedasticityhomoscedasticity, or, or equalequal
(homo) spread (scedasticity) or equal variance. Stated differently, (3.2.2)(homo) spread (scedasticity) or equal variance. Stated differently, (3.2.2)
means that themeans that the Y populations corresponding to various X values have theY populations corresponding to various X values have the
same variance.same variance.
• Put simply,Put simply, the variation around the regression line (which is the line ofthe variation around the regression line (which is the line of
average relationship betweenaverage relationship between Y and X) is theY and X) is the samesame across the Xacross the X valuesvalues; it; it
neither increases or decreases asneither increases or decreases as X variesX varies
27. • In Figure 3.5, where the conditional variance of theIn Figure 3.5, where the conditional variance of the YY populationpopulation
varies withvaries with X. This situation is known asX. This situation is known as heteroscedasticityheteroscedasticity,, oror unequalunequal
spread, or variance. Symbolically, in this situationspread, or variance. Symbolically, in this situation (3.2.2) can be(3.2.2) can be
written aswritten as
• var (uvar (uii | X| Xii) =) = σσ22
ii (3.2.3)(3.2.3)
• Figure 3.5. shows that, var (Figure 3.5. shows that, var (u| Xu| X11) < var (u| X) < var (u| X22), . . . , < var (u| X), . . . , < var (u| Xii).).
Therefore, theTherefore, the likelihood is that thelikelihood is that the Y observations coming from theY observations coming from the
population with X = Xpopulation with X = X11 would be closer to the PRF than those comingwould be closer to the PRF than those coming
from populations correspondingfrom populations corresponding toto X = XX = X22, X = X, X = X33, and so on. In short,, and so on. In short,
notnot all Yall Y values correspondingvalues corresponding to the variousto the various X’sX’s will be equallywill be equally
reliable, reliabilityreliable, reliability being judged bybeing judged by how closely or distantly thehow closely or distantly the YY
values are distributed around their means, thatvalues are distributed around their means, that is, the points on theis, the points on the
PRF.PRF.
28. • The disturbancesThe disturbances uuii and uand ujj are uncorrelated, i.e.,are uncorrelated, i.e., no serial correlation. Thisno serial correlation. This
means that, givenmeans that, given XXii ,, the deviations of any two Y valuesthe deviations of any two Y values from their meanfrom their mean
value do not exhibit patternsvalue do not exhibit patterns.. In Figure 3.6a, the u’s are positively correlated,In Figure 3.6a, the u’s are positively correlated,
a positivea positive u followed by a positive u or a negative u followed by au followed by a positive u or a negative u followed by a negativenegative u.u.
In Figure 3.6b, the u’s are negatively correlated, a positive uIn Figure 3.6b, the u’s are negatively correlated, a positive u followed by afollowed by a
negativenegative u and vice versa.u and vice versa. If the disturbances follow systematic patterns,If the disturbances follow systematic patterns,
Figure 3.6Figure 3.6a and b, there is auto- or serial correlation.a and b, there is auto- or serial correlation. Figure 3.6Figure 3.6c showsc shows thatthat
there is no systematic pattern to thethere is no systematic pattern to the u’s, thus indicating zero correlation.u’s, thus indicating zero correlation.
30. • Suppose in our PRF (Suppose in our PRF (YYtt = β= β11 + β+ β22XXtt + u+ utt) that u) that utt and uand ut−1t−1 are positivelyare positively
correlated.correlated. ThenThen YYtt depends not only on Xdepends not only on Xtt but also onbut also on uut−1t−1 for ufor ut−1t−1 toto
some extentsome extent determinesdetermines uutt..
31. • The disturbanceThe disturbance u and explanatory variable Xu and explanatory variable X areare uncorrelateduncorrelated. The PRF. The PRF
assumes thatassumes that XX andand uu (which may represent(which may represent the influence of all the omittedthe influence of all the omitted
variables) have separate (and additive) influence onvariables) have separate (and additive) influence on YY. But if X and u are. But if X and u are
correlatedcorrelated, it is not possible to assess their, it is not possible to assess their individual effects onindividual effects on Y. Thus, if XY. Thus, if X
and u are positively correlated, X increasesand u are positively correlated, X increases whenwhen u increases and it decreasesu increases and it decreases
when u decreases. Similarly, if X and uwhen u decreases. Similarly, if X and u are negatively correlated,are negatively correlated, X increasesX increases
when u decreases and it decreaseswhen u decreases and it decreases whenwhen u increases. In either case, it isu increases. In either case, it is
difficult to isolate the influence of Xdifficult to isolate the influence of X andand u on Y.u on Y.
32. • In the hypothetical example of Table 3.1, imagine that we had only the firstIn the hypothetical example of Table 3.1, imagine that we had only the first
pair of observations onpair of observations on Y and X (4 and 1). From this single observation thereY and X (4 and 1). From this single observation there
is no way to estimateis no way to estimate the two unknowns,the two unknowns, ββ11 and βand β22. We need at least two pairs. We need at least two pairs
of observationsof observations to estimate the two unknownsto estimate the two unknowns
33. • This assumption too is not so innocuous as it looks. Look at Eq. (3.1.6). IfThis assumption too is not so innocuous as it looks. Look at Eq. (3.1.6). If
all theall the X values are identical, then XX values are identical, then Xii = X¯ and the denominator of= X¯ and the denominator of thatthat
equation will be zero, making it impossible to estimateequation will be zero, making it impossible to estimate β2 and therefore β1.β2 and therefore β1.
Looking atLooking at our family consumption expenditure example in Chapter 2, ifour family consumption expenditure example in Chapter 2, if
there is very little variation in family income, we will not be able to explainthere is very little variation in family income, we will not be able to explain
much of the variation in the consumption expenditure.much of the variation in the consumption expenditure.
34. • An econometric investigation begins with the specification of theAn econometric investigation begins with the specification of the
econometric model underlying the phenomenon of interest. Some importanteconometric model underlying the phenomenon of interest. Some important
questionsquestions that arise in the specification of the model include the following:that arise in the specification of the model include the following:
(1) What variables should be included in the model?(1) What variables should be included in the model?
• (2) What is the functional form of the model? Is it linear in the parameters,(2) What is the functional form of the model? Is it linear in the parameters,
the variables, or both?the variables, or both?
• (3) What are the probabilistic assumptions made about the(3) What are the probabilistic assumptions made about the YYii , the X, the Xii, and, and
the uthe uii entering the model?entering the model?
35. • Suppose we choose the following two models to depict the underlyingSuppose we choose the following two models to depict the underlying
relationship between the rate of change of money wages and therelationship between the rate of change of money wages and the
unemployment rate:unemployment rate:
• YYii == αα11 + α+ α22XXii + u+ uii (3.2.7)(3.2.7)
• YYii == ββ11 + β+ β22 ((11/X/Xii ) + u) + uii (3.2.8)(3.2.8)
• wherewhere YYii = the rate of change of money wages, and X= the rate of change of money wages, and Xii = the unemployment= the unemployment
rate. The regression model (3.2.7) is linear both in the parameters and therate. The regression model (3.2.7) is linear both in the parameters and the
variables, whereas (3.2.8) is linear in the parameters (hence a linearvariables, whereas (3.2.8) is linear in the parameters (hence a linear
regression model by our definition) but nonlinear in the variableregression model by our definition) but nonlinear in the variable X. NowX. Now
considerconsider Figure 3.7.Figure 3.7.
• If model (3.2.8) is the “correct” or the “true” model, fitting the modelIf model (3.2.8) is the “correct” or the “true” model, fitting the model
(3.2.7) to the scatterpoints shown in Figure 3.7 will give us wrong(3.2.7) to the scatterpoints shown in Figure 3.7 will give us wrong
predictions.predictions.
• Unfortunately, in practice one rarely knows the correct variables to includeUnfortunately, in practice one rarely knows the correct variables to include
in the model or the correct functional form of the model or the correctin the model or the correct functional form of the model or the correct
probabilistic assumptions about the variables entering the model for theprobabilistic assumptions about the variables entering the model for the
theory underlying the particular investigation may not be strong or robusttheory underlying the particular investigation may not be strong or robust
37. • We will discuss this assumption in Chapter 7, where we discuss multipleWe will discuss this assumption in Chapter 7, where we discuss multiple
regression models.regression models.
38. PRECISION OR STANDARD ERRORS OF LEAST-
SQUARES ESTIMATES
• The least-squares estimatesThe least-squares estimates are a function of the sample dataare a function of the sample data. But since the. But since the
data change from sample to sample, the estimates will change. Therefore,data change from sample to sample, the estimates will change. Therefore,
what is needed is some measure of “what is needed is some measure of “reliabilityreliability” or precision of the” or precision of the
estimatorsestimators βˆβˆ11 andand βˆβˆ22. In statistics the precision of an estimate is measured by. In statistics the precision of an estimate is measured by
itsits standardstandard error (se),error (se), which can be obtained as follows:which can be obtained as follows:
39. • σσ22
is the constantis the constant oror homoscedastichomoscedastic variance ofvariance of uuii of Assumption 4.of Assumption 4.
• σσ22
itself isitself is estimated by the following formula:estimated by the following formula:
• wherewhere ˆˆσσ22
is the OLS estimator of the trueis the OLS estimator of the true but unknown σbut unknown σ22
and where theand where the
expressionexpression n−2 is known as the number of degrees of freedom (df), isn−2 is known as the number of degrees of freedom (df), is
the residual sum of squares (the residual sum of squares (RSSRSS). Once). Once is known, ˆis known, ˆσσ22
can be easilycan be easily
computed.computed.
• Compared with Eq. (3.1.2), Eq. (3.3.6) is easy to use, for it does not requireCompared with Eq. (3.1.2), Eq. (3.3.6) is easy to use, for it does not require
computing ˆcomputing ˆuuii for each observationfor each observation..
40. • SinceSince
• anan alternative expressionalternative expression for computing isfor computing is
• In passing, note that the positive square root of ˆIn passing, note that the positive square root of ˆσσ22
• is known as theis known as the standard error of estimatestandard error of estimate or the standard error of theor the standard error of the
regression (se). It is simply the standard deviation of theregression (se). It is simply the standard deviation of the Y values aboutY values about thethe
estimated regression line and isestimated regression line and is often used as a summary measure of theoften used as a summary measure of the
“goodness of fit“goodness of fit” of the estimated regression line.” of the estimated regression line.
41. • Note theNote the following features of the variancesfollowing features of the variances (and therefore the standard(and therefore the standard
errors) oferrors) of βˆβˆ11 andand βˆβˆ22..
• 1. The variance of1. The variance of βˆβˆ22 is directly proportional tois directly proportional to σσ22
but inversely proportionalbut inversely proportional
toto xx22
ii . That is, given. That is, given σσ22
, the larger the variation in the, the larger the variation in the XX values, thevalues, the smaller thesmaller the
variance ofvariance of βˆβˆ22 and hence the greater the precision with whichand hence the greater the precision with which ββ22 can becan be
estimated.estimated.
• 2. The variance of2. The variance of βˆβˆ11 is directly proportional tois directly proportional to σσ22
andand XX22
ii but inverselybut inversely
proportional toproportional to xx22
ii and the sample sizeand the sample size nn..
42. • 3. Since3. Since βˆβˆ11 and βˆand βˆ22 are estimators, they will not only vary from sample toare estimators, they will not only vary from sample to
samplesample but in a given sample theybut in a given sample they are likely to be dependent onare likely to be dependent on each other,each other,
this dependence being measured by the covariance between them.this dependence being measured by the covariance between them.
• Since var (Since var (βˆβˆ22) is) is alwaysalways positivepositive, as is the variance of any variable, the, as is the variance of any variable, the naturenature
of the covarianceof the covariance betweenbetween βˆβˆ11 andand βˆβˆ22 depends on the sign of X¯ . If X¯ isdepends on the sign of X¯ . If X¯ is
positive,positive, then as the formula shows, the covariance will bethen as the formula shows, the covariance will be negativenegative. Thus, if. Thus, if
the slope coefficientthe slope coefficient ββ22 is overestimated (i.e., the slope is too steep), theis overestimated (i.e., the slope is too steep), the
interceptintercept coefficientcoefficient ββ11 will be underestimated (i.e., the intercept will be toowill be underestimated (i.e., the intercept will be too
small).small).
43. PROPERTIES OF LEAST-SQUARES ESTIMATORS:
THE GAUSS–MARKOV THEOREM
• To understand this theorem, we need to consider the best linearTo understand this theorem, we need to consider the best linear
unbiasedness property of an estimator. An estimator, say the OLS estimatorunbiasedness property of an estimator. An estimator, say the OLS estimator
βˆβˆ22, is said to be a best linear unbiased, is said to be a best linear unbiased estimatorestimator (BLUE) of(BLUE) of ββ22 if the followingif the following
hold:hold:
• 1.1. It is linearIt is linear, that is, a linear function of a random variable, such as the, that is, a linear function of a random variable, such as the
dependent variabledependent variable Y in the regression model.Y in the regression model.
• 2.2. It is unbiasedIt is unbiased, that is, its, that is, its averageaverage or expected value,or expected value, E(βˆE(βˆ22), is equal to), is equal to thethe
true value,true value, ββ22..
• 3.3. It has minimum varianceIt has minimum variance in the class of all such linear unbiasedin the class of all such linear unbiased
estimators; an unbiased estimator with the least variance is known as anestimators; an unbiased estimator with the least variance is known as an
efficient estimator.efficient estimator.
44. • What all this means can be explained with the aid of Figure 3.8. In FigureWhat all this means can be explained with the aid of Figure 3.8. In Figure
3.8(3.8(a) we have shown the sampling distribution of the OLSa) we have shown the sampling distribution of the OLS estimatorestimator βˆβˆ22, that, that
is, the distribution of the values taken by βˆis, the distribution of the values taken by βˆ22 in repeatedin repeated sampling experiment.sampling experiment.
For convenience we have assumedFor convenience we have assumed βˆβˆ22 to be distributed symmetricallyto be distributed symmetrically. As the. As the
figure shows, the mean of thefigure shows, the mean of the βˆβˆ22 values, E(βˆvalues, E(βˆ22), is equal to the true β), is equal to the true β22. In this. In this
situation we say thatsituation we say that βˆβˆ22 is an unbiased estimator of βis an unbiased estimator of β22. In Figure 3.8(b) we. In Figure 3.8(b) we
have shown the sampling distribution ofhave shown the sampling distribution of β∗β∗22, an alternative estimator of β, an alternative estimator of β22
obtained by using another (i.e., other than OLS) method.obtained by using another (i.e., other than OLS) method.
46. • For convenience, assume thatFor convenience, assume that β*β*22, like, like βˆβˆ22, is unbiased, that is, its average or, is unbiased, that is, its average or
expected value isexpected value is equal toequal to ββ22. Assume further that both βˆ. Assume further that both βˆ22 and β*and β*22 are linearare linear
estimators, that is, they are linear functions ofestimators, that is, they are linear functions of Y. Which estimator, βˆY. Which estimator, βˆ22 or β*or β*22,,
would you choose? To answer this question, superimpose the two figures, aswould you choose? To answer this question, superimpose the two figures, as
in Figure 3.8(in Figure 3.8(c).c). It is obvious that although bothIt is obvious that although both βˆβˆ22 and β*and β*2 are unbiased2 are unbiased
the distribution ofthe distribution of β*β*22 is more diffused or widespread around the meanis more diffused or widespread around the mean
value than the distribution ofvalue than the distribution of βˆβˆ22. In other words, the variance of β*. In other words, the variance of β*22 is largeris larger
than the variance ofthan the variance of βˆβˆ22..
• Now given two estimators that are both linear and unbiased, one wouldNow given two estimators that are both linear and unbiased, one would
choose the estimator with the smaller variance because it is more likely tochoose the estimator with the smaller variance because it is more likely to
be close tobe close to ββ22 than the alternative estimator.than the alternative estimator. In short, one would choose theIn short, one would choose the
BLUE estimator.BLUE estimator.
47. THE COEFFICIENT OF DETERMINATION r2
:
A MEASURE OF “GOODNESS OF FIT”
• We now consider theWe now consider the goodness of fitgoodness of fit of the fitted regression line to a set ofof the fitted regression line to a set of
data; that is, we shall find out how “well” the sample regression line fits thedata; that is, we shall find out how “well” the sample regression line fits the
data. The coefficient of determinationdata. The coefficient of determination rr22
(two-variable case) or(two-variable case) or RR22
(multiple(multiple
regression) is a summaryregression) is a summary measure that tells howmeasure that tells how well the sample regressionwell the sample regression
line fits the dataline fits the data..
• Consider a heuristic explanationConsider a heuristic explanation ofof rr22
in terms of a graphical device, knownin terms of a graphical device, known
as the Venn diagramas the Venn diagram shown in Figure 3.9.shown in Figure 3.9.
• In this figure the circleIn this figure the circle Y represents variation in the dependent variable YY represents variation in the dependent variable Y andand
the circlethe circle X represents variation in the explanatory variable X. TheX represents variation in the explanatory variable X. The overlap ofoverlap of
the two circles indicates the extent to which the variation inthe two circles indicates the extent to which the variation in Y is explainedY is explained
by the variation in X.by the variation in X.
49. • To compute thisTo compute this rr22
, we proceed as follows: Recall that, we proceed as follows: Recall that
• YYii = Yˆ= Yˆii +uˆ+uˆii (2.6.3)(2.6.3)
• or in the deviation formor in the deviation form
• yyii = ˆy= ˆyii + ˆu+ ˆuii (3.5.1)(3.5.1)
• where use is made of (3.1.13) and (3.1.14). Squaring (3.5.1) on both sideswhere use is made of (3.1.13) and (3.1.14). Squaring (3.5.1) on both sides
and summing over the sample, we obtainand summing over the sample, we obtain
•
• SinceSince = 0 and yˆ= 0 and yˆii = βˆ= βˆ22xxii ..
50. • The various sums of squares appearing in (3.5.2) can be described asThe various sums of squares appearing in (3.5.2) can be described as
follows:follows: == total variation of the actual Y values about theirtotal variation of the actual Y values about their
sample meansample mean, which may be called the total sum of squares (, which may be called the total sum of squares (TSSTSS).).
• = variation of the estimated Y= variation of the estimated Y
values about their mean (¯ˆvalues about their mean (¯ˆY = Y¯), which appropriately may be called theY = Y¯), which appropriately may be called the
sum of squares due to/or explained by regression, or simply thesum of squares due to/or explained by regression, or simply the explainedexplained
sum of squares (ESS).sum of squares (ESS). = residual or unexplained variation of the= residual or unexplained variation of the YY
values about the regressionvalues about the regression line, or simply theline, or simply the residual sum of squares (RSS).residual sum of squares (RSS).
Thus, (3.5.2) isThus, (3.5.2) is
• TSS = ESS + RSSTSS = ESS + RSS (3.5.3)(3.5.3)
• and shows that the total variation in the observedand shows that the total variation in the observed Y values about their meanY values about their mean
value can be partitioned into two parts, one attributable to the regressionvalue can be partitioned into two parts, one attributable to the regression
line and the other to random forces because not all actualline and the other to random forces because not all actual Y observations lieY observations lie
on the fitted line. Geometrically, we have Figure 3.10on the fitted line. Geometrically, we have Figure 3.10
53. • The quantityThe quantity rr22
thus defined is known as the (sample) coefficient ofthus defined is known as the (sample) coefficient of
determinationdetermination and is the most commonly used measure of the goodness of fitand is the most commonly used measure of the goodness of fit
of a regression line. Verbally,of a regression line. Verbally, rr22
measures the proportion ormeasures the proportion or percentagepercentage of theof the
total variation in Y explained by the regression modeltotal variation in Y explained by the regression model..
• Two properties ofTwo properties of rr22
may be notedmay be noted::
• 1.1. It is a nonnegative quantityIt is a nonnegative quantity..
• 22. Its limits are 0 ≤. Its limits are 0 ≤ rr22
≤ 1.≤ 1. An rAn r22
of 1 means a perfect fit, that is, Yˆof 1 means a perfect fit, that is, Yˆii = Y= Yii forfor
eacheach i. On the other hand, an ri. On the other hand, an r22
of zero means that there is no relationshipof zero means that there is no relationship
between the regressand and the regressor whatsoever (i.e.,between the regressand and the regressor whatsoever (i.e., βˆβˆ22 = 0). In= 0). In thisthis
case, as (3.1.9) shows,case, as (3.1.9) shows, YˆYˆii = βˆ= βˆ11 = Y¯, that is, the best prediction of any Y= Y¯, that is, the best prediction of any Y valuevalue
is simply its mean value. In this situation therefore the regression line willis simply its mean value. In this situation therefore the regression line will
be horizontal to thebe horizontal to the X axis.X axis.
• AlthoughAlthough rr22
can be computed directly from its definition given in (3.5.5),can be computed directly from its definition given in (3.5.5), it canit can
be obtained more quickly from the following formula:be obtained more quickly from the following formula:
57. • Some of theSome of the properties ofproperties of rr are as follows (see Figure 3.11):are as follows (see Figure 3.11):
• 1.1. It can be positive or negativeIt can be positive or negative,,
• 2.2. It lies between the limits of −1 and +1It lies between the limits of −1 and +1; that is, −1 ≤; that is, −1 ≤ r ≤ 1.r ≤ 1.
• 3.3. It is symmetrical in natureIt is symmetrical in nature; that is, the coefficient of correlation between; that is, the coefficient of correlation between XX
and Y(rand Y(rXYXY) is the same as that between Y and X(r) is the same as that between Y and X(rYXYX).).
• 4.4. It is independent of the origin and scaleIt is independent of the origin and scale; that is, if we define; that is, if we define X*X*ii = aX= aXii + C+ C
and Y*and Y*ii = bY= bYii + d, where a > 0, b > 0, and c and d are constants,+ d, where a > 0, b > 0, and c and d are constants, thenthen rr
between X* and Y* is the same as that between the original variables X and Y.between X* and Y* is the same as that between the original variables X and Y.
• 5.5. If X and Y are statistically independentIf X and Y are statistically independent,, the correlation coefficient betweenthe correlation coefficient between
them is zero; but ifthem is zero; but if r = 0r = 0, it does, it does not mean that two variables arenot mean that two variables are
independent.independent.
• 6.6. It is a measure ofIt is a measure of linearlinear association or linear dependence only; it has noassociation or linear dependence only; it has no
meaning for describing nonlinear relations.meaning for describing nonlinear relations.
• 7. Although it is a measure of linear association between two variables, it7. Although it is a measure of linear association between two variables, it
does not necessarily imply any cause-and-effectdoes not necessarily imply any cause-and-effect relationship.relationship.
59. • In the regression context,In the regression context, rr22
is a more meaningful measure than r, foris a more meaningful measure than r, for thethe
former tells us the proportion of variation in the dependent variableformer tells us the proportion of variation in the dependent variable
explained by the explanatory variable(s) and therefore provides an overallexplained by the explanatory variable(s) and therefore provides an overall
measure of the extent to which the variation in one variable determines themeasure of the extent to which the variation in one variable determines the
variation in the other. The latter does not have such value. Moreover, as wevariation in the other. The latter does not have such value. Moreover, as we
shall see, the interpretation ofshall see, the interpretation of r (= R) in a multiple regression model isr (= R) in a multiple regression model is ofof
dubious valuedubious value.. In passing, note that theIn passing, note that the rr22
defined previously can also bedefined previously can also be
computed as the squared coefficient of correlation between actual Ycomputed as the squared coefficient of correlation between actual Yii and theand the
estimated Yestimated Yii ,, namely,namely, YˆYˆii . That is, using (3.5.13), we can write. That is, using (3.5.13), we can write
60. • wherewhere YYii = actual Y, Yˆ= actual Y, Yˆii = estimated Y, and Y¯ = Y¯ˆ = the mean of Y. For= estimated Y, and Y¯ = Y¯ˆ = the mean of Y. For
proof, see exercise 3.15. Expression (3.5.14) justifies the description ofproof, see exercise 3.15. Expression (3.5.14) justifies the description of rr22
asas aa
measure of goodness of fit, for it tells how close the estimatedmeasure of goodness of fit, for it tells how close the estimated Y values areY values are toto
their actual values.their actual values.
62. • βˆβˆ11 = 24.4545= 24.4545 var (var (βˆβˆ11) = 41.1370) = 41.1370 andand se (se (βˆβˆ11) = 6.4138) = 6.4138
• βˆβˆ22 = 0.5091= 0.5091 var (var (βˆβˆ22) = 0.0013) = 0.0013 andand se (se (βˆβˆ22) = 0.0357) = 0.0357
• cov (cov (βˆβˆ11, βˆ, βˆ22) = −0.2172) = −0.2172 σˆσˆ22
= 42.1591= 42.1591 (3.6.1)(3.6.1)
• rr22
= 0.9621= 0.9621 r = 0.9809r = 0.9809 df = 8df = 8
• The estimated regression line therefore isThe estimated regression line therefore is
• YˆYˆii = 24.4545 + 0.5091X= 24.4545 + 0.5091Xii (3.6.2)(3.6.2)
• which is shown geometrically as Figure 3.12.which is shown geometrically as Figure 3.12.
• Following Chapter 2, the SRF [Eq. (3.6.2)] and the associated regressionFollowing Chapter 2, the SRF [Eq. (3.6.2)] and the associated regression
line are interpreted as follows: Each point on the regression line gives anline are interpreted as follows: Each point on the regression line gives an
estimate of the expected or mean value of Y corresponding to the chosen Xestimate of the expected or mean value of Y corresponding to the chosen X
value; that is,value; that is, YˆYˆii is an estimate of E(Y | Xis an estimate of E(Y | Xii). The value of βˆ). The value of βˆ22 = 0.5091, which= 0.5091, which
measures the slope of the line, shows that, within the sample range ofmeasures the slope of the line, shows that, within the sample range of XX
between $80 and $260 per week, asbetween $80 and $260 per week, as X increases, say, by $1, the estimatedX increases, say, by $1, the estimated
increase in the mean or average weekly consumption expenditure amountsincrease in the mean or average weekly consumption expenditure amounts
to about 51 cents. The value ofto about 51 cents. The value of βˆβˆ11 = 24.4545, which is the intercept of the= 24.4545, which is the intercept of the
line, indicates the average level of weekly consumption expenditure whenline, indicates the average level of weekly consumption expenditure when
weekly income is zero.weekly income is zero.
65. • However, this is a mechanical interpretation of the intercept term. InHowever, this is a mechanical interpretation of the intercept term. In
regression analysis such literal interpretation of the intercept term may notregression analysis such literal interpretation of the intercept term may not
be always meaningful, although in the present example it can be arguedbe always meaningful, although in the present example it can be argued
that a family without any income (because of unemployment, layoff, etc.)that a family without any income (because of unemployment, layoff, etc.)
might maintain some minimum level of consumption expenditure either bymight maintain some minimum level of consumption expenditure either by
borrowing or dissaving. But in general one has to use common sense inborrowing or dissaving. But in general one has to use common sense in
interpreting the intercept term, for very often the sample range ofinterpreting the intercept term, for very often the sample range of X valuesX values
may not include zero as one of the observed values.may not include zero as one of the observed values. Perhaps it is best toPerhaps it is best to
interpret the intercept term as the mean or average effect oninterpret the intercept term as the mean or average effect on Y of all theY of all the
variables omitted from the regression model. The valuevariables omitted from the regression model. The value ofof r 2 of 0.9621 meansr 2 of 0.9621 means
that about 96 percent of the variation in the weeklythat about 96 percent of the variation in the weekly consumption expenditureconsumption expenditure
is explained by income. Sinceis explained by income. Since r 2 can at most be 1,r 2 can at most be 1, the observedthe observed r 2 suggestsr 2 suggests
that the sample regression line fits the data verythat the sample regression line fits the data very well.26 The coefficient ofwell.26 The coefficient of
correlation of 0.9809 shows that the two variables, consumptioncorrelation of 0.9809 shows that the two variables, consumption
expenditure and income, are highly positively correlated. The estimatedexpenditure and income, are highly positively correlated. The estimated
standard errors of the regression coefficients will be interpreted in Chapterstandard errors of the regression coefficients will be interpreted in Chapter
5.5.
66. • See numerical exapmles 3.1-3.3See numerical exapmles 3.1-3.3