SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Chapter 1

The Two Variable Linear Model

1.1     The Basic Linear Model
The goal of this section is to build a simple model for the non-exact relationship between
two variables Y and X, related by some economic theory. For example, consumption and
income, quantity consumed and price, etc.
    The proposed model:

                            Yi = α + βXi + ui ,        i = 1, . . . , n                    (1.1)

where α and β are unknown parameters which are the purpose of the estimation. What we
will call ‘data’ are the n realizations of (Xi , Yi ). We are abusing notation a bit by using the
same letters to refer to random variables and their realizations.
     ui is an unobserved random variable which represents the fact that the relationship be-
tween Y and X is not exactly linear. We will momentarily assumet that ui has expected
value zero. Note that if ui = 0, then the relationship between Yi and Xi would be exactly
linear, so it is the presence of ui what breaks this exact nature of the relationship. Y is usu-
ally reffered to as the explained or dependent variable, X is the explanatory or independent
variable.
     We will refer to ui as the ‘error term’, which is a terminology more appropriate in
the experimental sciences, where a cause x (say the dose of a drug) is administered to
different subjects and then an effect y is measured (say, body temperature). In this case ui
might be a measurement error due to the erratic behavior of a measurement instrument (for
example, a thermometer). In a social science like economics, ui represents a broader notion
of ‘ignorance’ that represents whatever is not observed (by ignorance, ommision, etc.) that
affects y besides x.

                           [ FIGURE 1: SCATTER DIAGRAM ]

   The first goal will be to find reasonable estimates for α and β based solely on the data,
that is (Xi , Yi ), i = 1, . . . , n.

                                               1
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                      2


1.2     The Least Squares Method
                           ˆ
Let us denote with α and β the estimates of α and β in the simple linear model. Let us
                     ˆ
also define the following quantities. The first one is an estimate of Y :

                                        ˆ    ˆ ˆ
                                        Yi ≡ α + βXi

Intuitively, we have replaced α and β by its estimates, and treated ui as if the relationship
were exactly linear, i.e., as if ui were zero. This will be undersood as an estimate of Yi .
Then it is natural to define a notion of estimation error as follows:
                                                   ˆ
                                         ei ≡ Yi − Yi

which measures the difference between Yi and its estimate.
                                   ˆ
    A natural goal is to find α and β so as ei ’s are ‘small’ in some sense. It is interesting to
                             ˆ
see how the problem works from a graphical perspective. Data will correspond to n points
scattered in a (X, Y ) plane. The presence of a linear relationship like (1.1) is consistent
with points scatterd around an imaginary straight line. Note that if ui where indeed zero,
all points will lie along the same line, consistent with an exact linear relationship. As
mentioned above, it is the presence of ui what breaks this exact relationship.
                                                      ˆ
    Now note that for any given values of α and β, the points determined by the fitted
                                             ˆ
model:
                                        ˆ    ˆ ˆ
                                       Y ≡ α + βX
                                                                                  ˆ
correspond to a line in the (X, Y ) plane. Hence different values of α and β correspond
                                                                           ˆ
to different estimated lines, which implies that choosing particular values is equivalent to
choosing a specific line on the plane. For the i-th observation, the estimation errors ei can
                                                                                      ˆ
be seen graphically as the vertical distance between the points (Xi , Yi ) and (Xi , Yi ), that
                                                                                    ˆ
is, between (Xi , Yi ) and the fitted line. So, intuitively, we want values of α and β so as the
                                                                              ˆ
fitted line they induce passes as close as possible to all the points in the scatter so errors
are as small as possible.

            [ FIGURE 2: SCATTER DIAGRAM WITH ‘CANDIDATE’ LINE]

    Note that if we had only two observations, the problem has a very simple solution, and
                                                ˆ
reduces to finding the only two values of α and β that make estimation errors exactly equal
                                          ˆ
to zero. Graphically, this is possible since this is equivalent to finding the only straight
line that passes through the two observations available. Trivially, in this extreme case all
estimation errors will be zero.
    The more realistic case appears when we have more than two observations, not all of
them lying on a single line. Obviously, a line cannot pass through more than two non-
aligned points, so we cannot make all errors equal to zero. So now the problem is to find
                 ˆ
values of α and β that determine a line that passes the closest as posible to all the points,
          ˆ
so estimation errors are, in the aggregate, small. For this we need to introduce a criterion
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                         3


of what do we mean by the line being close or far from the points. Let us define a penalty
function, which consists in adding all the estimation errors squared, so as positive and
                                            ˆ
negative errors matter alike. For any α and β, this will give us an idea of how large is the
                                      ˆ
aggregate estimation error:
                                           n
                               α ˆ
                           SSR(ˆ , β) =         e2 =
                                                 i            ˆ ˆ
                                                        (Yi − α − βXi )2
                                          i=1

    SSR stands for sum of squared residuals. Note that given the observations Yi and Xi ,
                                           ˆ                                  ˆ
this is a function that depends on α and β, that is, different values of α and β correspond
                                    ˆ                                   ˆ
to different lines that pass through the data points, implying different estimation errors. It
                                  ˆ
is now natural to look for α and β so as to make this aggregate error as small as possible.
                            ˆ
    The values of α and β
                   ˆ      ˆ that minimize the sum of squared residuals are:

                                                         ¯ ¯
                                                Xi Yi − nY X
                                      ˆ
                                      β=
                                                 Xi      ¯
                                                    2 − nX 2


and
                                              ¯   ˆ¯
                                          α = Y − βX
                                          ˆ
which are known as the least squares estimators of β and α.




      Derivation of the Least Squares Estimators
      The next paragraphs show how to obtain these estimators. Fortunately, it is easy to
                     α ˆ
      show that SRC(ˆ , β) is globally concave and differentiable, so first order conditions for
      a local minimum are:


                                              α ˆ
                                        ∂SRC(ˆ , β)
                                                        = 0
                                           ∂α
                                            ˆ
                                              α ˆ
                                        ∂SRC(ˆ , β)
                                                        = 0
                                            ˆ
                                           ∂β

      The first order condition is:
                               ∂     e2
                                        = −2            ˆ ˆ
                                                  (Yi − α − βXi ) = 0                    (1.2)
                                   ∂α
                                    ˆ
      Dividing by minus 2 and distributing the summations:

                                                α ˆ
                                          Yi = nˆ + β      Xi                            (1.3)
      This last expression is very important, and we will return to it frequently. From the
      second first order condition:
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                               4



                               ∂     e2
                                        = −2                   ˆ ˆ
                                                      Xi (Yi − α − βXi ) = 0                   (1.4)
                                    ˆ
                                   ∂β
     Dividing by -2 and distributing the summations:

                                          Xi Yi = α
                                                  ˆ             ˆ
                                                           Xi + β       2
                                                                       Xi                      (1.5)

                                                                              ˆ ˆ
     (1.3) and (1.5) form a system of two linear equations with two unknowns (α y β) known
     as the normal equations.
     Dividing (1.3) by n and solving for α we get:
                                         ˆ

                                                      ¯   ˆ¯
                                                  α = Y − βX
                                                  ˆ                                            (1.6)

     Replacing in (1.5):


                                       Xi Yi        ¯   ˆ¯
                                                 = (Y − β X)             ˆ
                                                                    Xi + β       Xi 2

                                       Xi Yi       ¯
                                                 = Y            ˆ¯
                                                           Xi − β X          ˆ
                                                                        Xi + β          Xi 2
                              ¯
                      Xi Yi − Y          Xi        ˆ
                                                 = β               ¯
                                                            Xi 2 − X        Xi


                                                            ¯
                                                    Xi Yi − Y   Xi
                                          ˆ
                                          β=
                                                    Xi 2−X ¯    Xi

                ¯
     Note that: X =        Xi /n then               ¯
                                               Zi = Zn. Replacing, we get:

                                                              ¯ ¯
                                                     Xi Yi − nY X
                                           ˆ
                                           β=                                                  (1.7)
                                                      Xi      ¯
                                                        2 − nX 2




                                                                    ¯            ¯
   It will be useful to adopt the following notation. xi = Xi − X, and yi = Yi − Y , so
lowercase letters denote the observations as deviations from their sample means.
   Using this notation:

                           xi yi =               ¯       ¯
                                           (Xi − X)(Yi − Y )
                                   =                   ¯   ¯     ¯¯
                                           (Xi Yi − Xi Y − XYI + X Y )
                                   =                   ¯
                                               Xi Yi − Y           ¯
                                                              Xi − X              ¯¯
                                                                            Yi + nX Y
                                   =                    ¯ ¯    ¯¯     ¯¯
                                               Xi Yi − nY X − nX Y + nX Y
                                   =                    ¯ ¯
                                               Xi Yi − nY X
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                  5


corresponds to the numerator of (1.7). Making a similar operation in the denominator of
(1.7) we get the following alternative expression for the least squares estimate of β:

                                        ˆ           xi yi
                                        β=
                                                     x2i

                 [ FIGURE 3: SCATTER DIAGRAM AND OLS LINE ]


1.3     Algebraic Properties of Least Squares Estimators
By algebraic properties of the estimator we mean those that are a direct consequence of
the minimizacion process, stressing the difference with statistical properties, which will be
studied in the next section.

   • Property 1:     ei = 0 From the first normal equation (1.2), dividing by minus 2 and
     replacing by the definition of ei we easily verify that as a consequence of minimizing
     the sum of squared residuals, the sum of the residuals, and consequently their average,
     is equal to zero.
   • Property 2:   Xi ei = 0. This can be checked by dividing by minus 2 in the second
     normal equation (1.4). The covariance between X and e is given by:
                                  1             ¯
                 Cov(X, e) =              (Xi − X)(ei − e)
                                                        ¯
                                 n−1
                                  1                              ¯          ¯¯
                             =               X i ei − e
                                                      ¯     Xi − X   ei +   Xe
                                 n−1
                                  1
                             =             X i ei
                                 n−1
      since from the previous property       ei and hence e are equal to zero. Then, this
                                                            ¯
      property says that as a consequence of using the method of least squares the sample
      covariance between the explanatory variable X and the error term e is zero, or, which
      is the same, the residuals are linearly unrelated to the explanatory variable.
                                                                            ˆ      ˆ ˆ
   • Property 3: The estimated regression line corresponds to the function Y (X) = α+ βX
                        ˆ as parameters, so as Y is a function that depends on X. Consider
     where e take α and β
                  ˆ                            ˆ
                                                        ¯
     what happens when we evaluate this function at X, the mean of X:

                                        ˆ ¯     ˆ ˆ¯
                                        Y (X) = α + β X

      But from (1.6):

                                          ˆ ˆ¯     ¯
                                          α + βX = Y
            ˆ ¯     ¯
      Then Y (X) = Y , this is, the estimated regression line by the method of least squares
      passes through the point of means.
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                 6


  • Property 4: Relationship between regression and correlation: Remember that the
    sample correlation coefficient between X and Y for a sample of n observations (Xi , Yi ),
    i = 1, 2, . . . , n is defined as:

                                                 Cov(X, Y )
                                       rXY =
                                                  SX SY
                                                                      ˆ
    The following result establishes the relationship between rXY and β.

                              ˆ         xi yi
                              β =
                                         x2i
                                                xi yi
                                   =
                                           x2
                                            i            x2
                                                          i
                                                                2
                                                               yi
                                                xi yi
                                   =
                                           x2
                                            i            x2
                                                          i
                                                                2
                                                               yi
                                                                2
                                                                   √
                                               xi yi           yi / n
                                   =                               √
                                           x2
                                            i
                                                          2
                                                         yi    x2 / n
                                                                i




                                          ˆ   SY
                                          β=r
                                              SX

                   ˆ
    If r = 0 then β = 0.Note that if both variables have the same sample variance, then
                                                                      ˆ
    the correlation coefficient is equal to the regression coefficient β. We can also see
                                            ˆ is not invariant to changes in scales or unit
    that, unlike the correlation coefficient, β
    of measurement.
                                                 ˆ                             ˆ
  • Property 5: The sample means of Yi and Yi are the same. By definition, Yi = Yi + ei
    for i = 1, . . . , n. Then, summing for every i:

                                        Yi =            ˆ
                                                        Yi +   ei

    and dividing by n:
                                          Yi        ˆ
                                                   Yi
                                              =
                                         n        n
    since   ei = 0 from the first order conditions. Then:

                                            ¯  ¯
                                               ˆ
                                            Y =Y

    which is the desired result.
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                       7


                ˆ                                             ˆ                   ˆ
  • Property 6: β is a linear function of the Yi ’s. This is, β can be written as β =     wi Yi ,
    where the wi ’s are real numbers not all of them equal to zero.
                                                         ˆ
          This is easy to prove. Let us start by writing β as follows:

                                            ˆ            xi
                                            β=                   yi
                                                          x2i

          and call wi = xi /     x2 . Note that:
                                  i


                                    xi =           ¯
                                             (Xi − X) =                ¯
                                                                 Xi − nX = 0

          which implies    wi = 0. From the previous result:

                                        ˆ
                                        β =          wi yi
                                            =                 ¯
                                                     wi (Yi − Y )
                                            =                ¯
                                                     wi Yi − Y        wi
                                            =        wi Yi

          which gives the desired result.

      This does not have much intuitive meaning so far, but it will be a useful for later
      results.


1.4    The Two-Variable Linear Model under the Classical As-
       sumptions

                               Yi = α + βXi + ui ,           i = 1, . . . , n

  In addition the the linear relationhips beteween Y and X we will assume:

 1. E(ui ) = 0,    i = 1, 2, . . . , n. ‘On average’ the relationship between Y and X is linear.

 2. V ar(ui ) = E[(ui − E(ui ))2 ] = Eu2 = σ 2 i = 1, 2, . . . , n. The variance of the error
                                        i
    term is constant for all observations. We will say that the error term is homoskedastic.

 3. Cov(ui , uj ) = 0 ∀i = j. The error term for an observation i is not linearly related
    to the error term of any other different observation j. If variables are measured over
    time, i.e., i = 1980, 1981 . . . , 1997 we will say that there is no autocorrelation. In
    general, we will say that there is no serial correlation. Note that since E(ui ) = 0,
    assuming Cov(ui , uj ) = 0 is equivalent to assuming E(ui uj ) = 0.

 4. The values of Xi are non-stochastic and not all of them equal.
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                     8


    The classical assumptions provide a basic probabilistic structure to study the linear
model. Most assumptions are of a pedagogic nature and we will study later on how they
can be relaxed. Nevertheless, they provide a simple framework to explore the nature of
least squares estimator.


1.5     Statistical Properties of Least Squares Estimators
Actually, the problem is to find good estimates of α, β and σ 2 . The previous section
presents estimates of the first two based on the principle of least squares so, trivially, these
estimates are ‘good’ in the sense that they minimize certain notion of fit: they make the
sum of squared residuals as small as possible. It is relevant to remark that in obtaining the
least squares estimators we have made no use of the classical assumptions described above.
Hence, the natural step is to explore whether we can deduce additional properties satisfied
by the least squares estimator, so we can say that it is good in a sense that goes beyond
that implicit in the least squares criterion. The following are called statistical properties
since they arise as a consequence of the statistical structure of the model.
    We will use repeatedly the following expressions for the LS estimators:

                                           ˆ           xi yi
                                           β=
                                                        x2i

                                              ¯   ˆ¯
                                          α = Y − βX
                                          ˆ
                                                  ˆ
    We will first explore the main properties of β in detail, and leave the analysis of α as
                                                                                       ˆ
                                                          ˆ
exercises. The starting conceptual point is to see that β depends explicitely on the Yi ’s
                                                                                        ˆ
which, in turn, depend on the ui ’s which are, by construction, random variables. Then β is
a random variable and then it makes sense to talk about its moments (mean and variance,
for example) and its distribution.
    It is easy to verify that:

                                          yi = xi β + u∗
                                                       i

where u∗ = ui − u, and, according to the classical assumptions, E(u∗ ) = 0 and, consequently,
        i       ¯                                                  i
E(yi ) = xi β. This is known as the classical two-variables linear model in deviations form
the means.

     ˆ                                      ˆ
   • β is an unbiased estimator, that is: E(β) = β

           To prove the result, from the linearity property of the previous section

                               ˆ
                               β =         wi yi
                             ˆ
                           E(β) =          wi E(yi )      (wi ’s are non-stochastic)

                                   =       wi xi β
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                                      9


                                 =     β     wi xi
                                 =     β     x2 /(
                                              i          x2 )
                                                          i
                                 =     β

                    ˆ
  • The variance of β is σ 2 /    x2
                                   i

                                      ˆ
         From the linearity property, β =       wi Yi , then

                                              ˆ
                                           V (β) = V            wi Yi

         Now note two things. First:

                              V (Yi ) = V (α + βXi + ui ) = V (ui ) = σ 2

         since Xi is non-stochastic. Second, note that E(Yi ) = α + βXi , so

                          Cov(Yi , Yj ) =       E [(Yi − E(Yi ))(Yj − E(Yj ))]
                                        =       E(ui uj ) = 0

         by the no serial correlation assumption. Then V (                       wi Yi ) is the variance of
         (weighted) sum of uncorrelated terms. Hence


                                    ˆ
                                 V (β) =        V          wi Yi
                                                        2
                                            =          wi V (Yi )

                                            =   σ2         2
                                                          wi
                                                                             2
                                            =   σ2       (x2 )/
                                                           i            x2
                                                                         i

                                            =   σ2 /        2
                                                           xi

                                                                  ˆ
  • Gauss-Markov Theorem: under the classical assumptions, β, the LS estimator of β,
    has the smallest variance among the class of linear and unbiased estimators. More
    formally, if β ∗ is any linear and unbiased estimator of β then:

                                                           ˆ
                                             V (β ∗ ) ≥ V (β)

    The proof of a more general version of this result will be postponed until Chapter 3.
    Discussion: BLUE, best does not mean good, we want minimum variance unbiased
    (without ‘linear’), ‘linear’ is not an interesting class, etc. If we drop any assumption,
    the OLS estimate is no longer BLUE. This justifies the use of OLS when all the
    asumptions are correct.
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                 10


Estimation of σ 2
So far we have concentrated the analysis on α and β. As an estimate for σ 2 we will propose:

                                                    e2
                                         S2 =        i
                                                   n−2
We will later show that S 2 provides and unbiased estimator for σ 2 .


1.6    Goodness of fit
After estimating the parameters of the regression line, it is interesting to check how well
does the estimated model fit the data. We want a measure of how well does the fitted line
represent the observations of the variables of the model.
    To look for such measure of goodness of fit, we start from the definition of fitted value
          ˆ
ei = Yi − Yi , solve for Yi and substract in both members the sample mean of Yi to obtain:
                                        ¯
                                   Yi − Y     ˆ    ¯
                                            = Yi − Y + ei
                                       y i = y i + ei
                                             ˆ

                                                                  ¯   ¯
                                                                      ˆ
using the notation defined before and noting that from Property 4, Y = Y . Taking the
square of both sides and summing over all the observations:

                               yi = (ˆi + ei )2
                                2
                                     y
                                      ˆ2
                                    = yi + ei + 2ˆi ei
                                                 y
                                2
                               yi =         ˆ2
                                            yi +        e2 + 2
                                                         i           y i ei
                                                                     ˆ

   The next step is to show that      yi ei = 0:
                                      ˆ


                                 y i ei =
                                 ˆ                  α ˆ
                                                   (ˆ + βXi )ei
                                       = α
                                         ˆ                ˆ
                                                     ei + β       Xi ei
                                       = 0+0

from the first order conditions. Then we get the following important decomposition:
                                    2
                                   yi =       yi 2 +
                                               ˆ     e2
                                                      i
                                T SS =       ESS + RSS

This is a key result that indicates that when the we use the least squares method, the total
variability of the dependent variable (TSS) around its sample mean can be decomposed
                                                                              ˆ
as the sum of two factors. The first one corresponds to the variability of Y (ESS) and
represents the variability explained by the fitted model. The second term represents the
variability not explained by the model (RSS), associated to the error term.
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                  11


    For a given model, the best situation arises when errors are all zero, in which case
the total variability (TSS) conincides with the explained varaibility (ESS). The worst case
corresponds to the situation in which the fitted model does not explain anything of the total
variability, in which case TSS coincides with RSS. From this observation, it is natural to
suggest the following goodness of fit measure, known as R2 , or coefficient of determination:
                                         SCE     SCR
                                  R2 =       =1−
                                         SCT     SCT
    It can be shown (we will do it in the exercises) that R2 = r2 . Consequently, 0 ≤ R2 ≤ 1.
When R2 = 1 |r| = 1, which corresponds to the case in which the relationship between
Y and X is exactly linear. On the other hand, R2 = 0 is equivalent to r = 0, which
corresponds to the case in which Y and X are linearly unrelated. It is interesting to note
that T SS does not depend on the estimated model, that is, it does not depend on β nor  ˆ
α. Then, if β
ˆ            ˆ and α are choosen so as to minimize SSR then they automatically maximize
                   ˆ
R2 . This implies that, for a given model, the least squares estimate maximizes R2 .
    The R2 is, arguably, the most used and abused measure of quality of a regression model.
A detailed analysis of the extent to which a high R2 can be taken as representative of a
‘good’ model will be undertaken in Chapter 4.


1.7    Inference in the two-variable linear model
The methods discussed so far provide reasonably good point estimates of the parameters
of interest α, β and σ 2 but usually we will be interested in evaluating hypotheses involving
the parameters, or constructing confidence intervals for them. For example, consider the
case of a simple consumption function where consumption is specified as a simple linear
function of income. We could be interested in evaluating whether the marginal propensity
to consume is equal to, say, 0.75, or that autonomous consumption is equal to zero.
     In general terms, a hypothesis about a parameter of the model is a conjecture about
it, that can be either false or true. The central problem is that in order to check whether
such statement is true or false we do not have the chance to observe such a parameter.
Instead, based on the available data, we have an estimate of it. As an example, suppose
we are interested in evaluating the, rather strong, null hypothesis that income is not an
explanatory factor of consumption, against the hypothesis that it is a relevant factor. In
our simple setup this corresponds to H0 : β = 0 against HA : β = 0. The logic we will use is
the following: if the null hypothesis were in fact true β would be exactly zero. Realizations
    ˆ                                        ˆ
of β can potentially take any value, since β is, by construction, a random variable. But if
βˆ is a ‘good’ estimator of β, when the null hypothesis is true it should take values close
                                                                                     ˆ
to zero. On the other hand, if the null hypothesis were false, the realizations of β should
be significantly different from zero. Then, the procedure consists in computing β   ˆ from the
data, and reject the null if the obtained value is significantly different from zero, or accept
otherwise.
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                       12


    Of course, the central concept behind this procedure lies in specifying what do we mean
                                           ˆ
by ‘very close’ or ‘very far’, given that β is a random variable. More specifically, we need
to know the distribution of β    ˆ under the null hypothesis so we can define precisely the
notion of ‘significantly different from zero’. In this context such a statement is necessarily
probabilistic, that is, we will take as the rejection region a set of values that lie ‘far away’
from zero, or, a set of values that under the null hypothesis appear with very low probability.
    The properties discussed in the previous section are informative about certain moments
   ˆ or α (for example, their means and variances) but they are not enough for the purposes
of β    ˆ
of knowing their distrubutions. Consequently, we need to introduce an additional assump-
tion. We will assume that ui is normally distributed, for i = 1, . . . , n. Given that we have
already assumed that ui has zero mean and constant variance equal to σ 2 , we have:

                                         ui ∼ N (0, σ 2 )

    Given that Yi = α + βXi + ui and that the Xi ’s are non-stochastic, we immediately
see that the Yi ’s are also normally distributed since linear transformations of normal ran-
dom variables are also normal. In particular, given that the normal distibution can be
characterized by its mean and variance only, we get:

                                     Yi ∼ N (α + βXi , σ 2 )
                                                  ˆ
, for every i = 1 . . . , n. In a similar fashion β is also normally distributed since by Property
1 it is a linear combination of the Yi ’s, that is:

                                     ˆ
                                     β ∼ N (β, σ 2 /        x2 )
                                                             i

   If σ 2 were known we could use this result to test simple hypothesis like:

                                 Ho : β = βo vs. HA : β = βo
                  ˆ
Substracting from β its expected value and dividing by its standard deviation we get:
                                          ˆ
                                          β − βo
                                   z=                ∼ N (0, 1)
                                        σ/      x2
                                                 i

Hence, if the null hypothesis is true, z should take values that are small in absolute value, and
large otherwise. As you should remember from a basic statistics course, this is acomplished
by defining a rejection region and an acceptance region as follows. The acceptance region
includes values that lie close to the one corresponding to the null hypothesis. Let c < 1 and
zc be a number such that:

                                   P r(−zc ≤ z ≤ zc ) = 1 − c

Replacing z by its definition:
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                    13




              P r βo − zc σ/             ˆ
                                    x2 ≤ β ≤ βo + zc σ/            x2    =1−c
                                     i                              i


   Then the acceptance region is given by the interval:

                                     βo ± zc (σ/        x2 )
                                                         i

                                                                ˆ
so we accept the null hypothesis if the observed realization of β lies within this interval and
reject otherwise. The number c is specified in advance and it is usually a small number. It
is called the significance of the test. Note that it gives the probability that we reject the
null hypohtesis when it is correct. Under the normality assumptions, the value zc can be
easily obtained from a table of percentiles of the standard normal distribution.
    As you should also remember from a basic statistics class, a similar logic can be applied
to construct a confidence interval for β0 . Note that:

                    ˆ
                P r β − zc (σ/                  ˆ
                                    x2 ) ≤ βo ≤ β + zc (σ/       x2 ) = 1 − c
                                     i                            i


Then a 1 − c confidence interval for β0 will be given by:

                                     ˆ
                                     β ± zc σ/          x2
                                                         i


      The practical problem with the previous procedures is that they require that we know
σ 2 , which is usually not available. Instead, we can compute its estimated version S 2 . Define
t as:
                                               ˆ
                                              β−β
                                         t=      √
                                              S/ x2

t is simply z where we have replaced σ 2 by S 2 . A very important result is that by doing
this replacement we have:

                                           t ∼ tn−2

that is, the ‘t-statistic’ has the so-called ‘t-distribution with n − 2 degrees of freedom’.
Hence, when we use the estimated version of the variance we obtain a different distribution
for the statistic used to test simple hypotheses and construct confidence intervals.
    Consequently, applying once again the same logic, in order to test the null hypothesis
Ho : β = βo against HA : β = βo we use the t-statistic:
                                          ˆ
                                          β − βo
                                    t=                ∼ tn−2
                                         S/      x2
                                                  i
CHAPTER 1. THE TWO VARIABLE LINEAR MODEL                                                      14


and a 1 − c confidence interval for β0 will be given by:
                                      ˆ
                                      β ± tc (S/           x2 )
                                                            i

where now tc is a percentile of the ‘t’ distribution with n − 2 degrees of freedom, which is
usually tabulated in basic statistics and econometrics textbooks.
    An important particular case is the insignificance hypothesis, that is Ho : βo = 0 against
HA : β0 = 0. Under the null X does not help explain Y , and under the alternative, X is
linearly related to Y . Replacing βo by 0 above we get:
                                                 ˆ
                                                 β
                                    tI =                  ∼ tn−2
                                           S/        x2
                                                      i

which is usually reported as a standard outcome in most regression packages.
   Another alternative to check for the significance of the linear relationship is to look at
how large is the explained sum of squares ESS. Recall that if the model has an intercept
we have that:
                                     T SS = ESS + RSS
If there is no linear relationship between Y and X, ESS should be very close to zero.
Consider the following statistic, which is just a ‘standardized’ version of the ESS:
                                               ESS
                                      F =
                                            RSS/(n − 2)
It can be shown that under the normality assumption, F has the F − distribution with
1 degree of freedom in the numerator, and n − 2 degrees of freedom in the denominator,
which is usually labeled as F (1, n − 2). Note that if X does not help explain Y in a linear
sense, ESS should be very small, which would make F very small. Then, we should reject
the null hypothesis that X does not help explain Y is the F statistic computed from the
data takes a large value, and accept otherwise.
    Note that by definition R2 = ESS/T SS = 1 − RSS/T SS. Divide both the numerator
of the F statistic by T SS. Solving for ESS and RSS and replacing above we can write the
F statistic in terms of the R2 coefficient as:
                                                 R2
                                    F =
                                           (1 − R2 )/(n − 2)
Then, the F test is actually looking at whether the R2 is significantly high. As it is expected,
there is a close relationship between the F statistic and the ‘t’ statistic for the insignificance
hypothesis (tI ). In fact, when there is no linear relationship between Y and X, ESS is zero,
or β0 = 0. In fact, it can be easily shown that:

                                                F = t2
                                                     I

We will leave the proof as an excercise.

Contenu connexe

Tendances

Math 1300: Section 5-2 Systems of Inequalities in two variables
Math 1300: Section 5-2 Systems of Inequalities in two variablesMath 1300: Section 5-2 Systems of Inequalities in two variables
Math 1300: Section 5-2 Systems of Inequalities in two variablesJason Aubrey
 
Predictve data mining
Predictve data miningPredictve data mining
Predictve data miningMintu246
 
Math 1300: Section 5-1 Inequalities in Two Variables
Math 1300: Section 5-1 Inequalities in Two VariablesMath 1300: Section 5-1 Inequalities in Two Variables
Math 1300: Section 5-1 Inequalities in Two VariablesJason Aubrey
 
Math 1300: Section 4-1 Review: Systems of Linear Equations in Two Variables
Math 1300: Section 4-1 Review: Systems of Linear Equations in Two VariablesMath 1300: Section 4-1 Review: Systems of Linear Equations in Two Variables
Math 1300: Section 4-1 Review: Systems of Linear Equations in Two VariablesJason Aubrey
 
Lesson 13: Exponential and Logarithmic Functions (Section 041 handout)
Lesson 13: Exponential and Logarithmic Functions (Section 041 handout)Lesson 13: Exponential and Logarithmic Functions (Section 041 handout)
Lesson 13: Exponential and Logarithmic Functions (Section 041 handout)Matthew Leingang
 
Lesson 13: Exponential and Logarithmic Functions (Section 041 slides)
Lesson 13: Exponential and Logarithmic Functions (Section 041 slides)Lesson 13: Exponential and Logarithmic Functions (Section 041 slides)
Lesson 13: Exponential and Logarithmic Functions (Section 041 slides)Matthew Leingang
 
Reviewof vectoranalysis
Reviewof vectoranalysisReviewof vectoranalysis
Reviewof vectoranalysisArun Soni
 
Business Statistics_an overview
Business Statistics_an overviewBusiness Statistics_an overview
Business Statistics_an overviewDiane Christina
 
7 4 Notes A
7 4 Notes A7 4 Notes A
7 4 Notes Ambetzel
 

Tendances (19)

04 regression
04 regression04 regression
04 regression
 
Math 1300: Section 5-2 Systems of Inequalities in two variables
Math 1300: Section 5-2 Systems of Inequalities in two variablesMath 1300: Section 5-2 Systems of Inequalities in two variables
Math 1300: Section 5-2 Systems of Inequalities in two variables
 
Predictve data mining
Predictve data miningPredictve data mining
Predictve data mining
 
Math 1300: Section 5-1 Inequalities in Two Variables
Math 1300: Section 5-1 Inequalities in Two VariablesMath 1300: Section 5-1 Inequalities in Two Variables
Math 1300: Section 5-1 Inequalities in Two Variables
 
Math 1300: Section 4-1 Review: Systems of Linear Equations in Two Variables
Math 1300: Section 4-1 Review: Systems of Linear Equations in Two VariablesMath 1300: Section 4-1 Review: Systems of Linear Equations in Two Variables
Math 1300: Section 4-1 Review: Systems of Linear Equations in Two Variables
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Acet syllabus 1 fac
Acet   syllabus 1 facAcet   syllabus 1 fac
Acet syllabus 1 fac
 
Lesson 13: Exponential and Logarithmic Functions (Section 041 handout)
Lesson 13: Exponential and Logarithmic Functions (Section 041 handout)Lesson 13: Exponential and Logarithmic Functions (Section 041 handout)
Lesson 13: Exponential and Logarithmic Functions (Section 041 handout)
 
Pertemuan iv fungsi
Pertemuan iv fungsiPertemuan iv fungsi
Pertemuan iv fungsi
 
Lesson 13: Exponential and Logarithmic Functions (Section 041 slides)
Lesson 13: Exponential and Logarithmic Functions (Section 041 slides)Lesson 13: Exponential and Logarithmic Functions (Section 041 slides)
Lesson 13: Exponential and Logarithmic Functions (Section 041 slides)
 
Reviewof vectoranalysis
Reviewof vectoranalysisReviewof vectoranalysis
Reviewof vectoranalysis
 
Differential calculus
Differential calculusDifferential calculus
Differential calculus
 
Grupo 13 taller parcial2_nrc2882
Grupo 13 taller parcial2_nrc2882Grupo 13 taller parcial2_nrc2882
Grupo 13 taller parcial2_nrc2882
 
Relations and functions
Relations and functionsRelations and functions
Relations and functions
 
Math 10.1
Math 10.1Math 10.1
Math 10.1
 
Cv25578584
Cv25578584Cv25578584
Cv25578584
 
Business Statistics_an overview
Business Statistics_an overviewBusiness Statistics_an overview
Business Statistics_an overview
 
Du5 functions
Du5 functionsDu5 functions
Du5 functions
 
7 4 Notes A
7 4 Notes A7 4 Notes A
7 4 Notes A
 

En vedette

An O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraph
An O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraphAn O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraph
An O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraphlbundit
 
Pinterest yes!
Pinterest yes!Pinterest yes!
Pinterest yes!Toni Marsh
 
Webinar promotions best practices 2 - 11-15-12
Webinar   promotions best practices 2 - 11-15-12Webinar   promotions best practices 2 - 11-15-12
Webinar promotions best practices 2 - 11-15-12Alex Littlewood
 
Pinterest for Brands: How to Use Pinterest for Marketing and Lead Generation
Pinterest for Brands: How to Use Pinterest for Marketing and Lead GenerationPinterest for Brands: How to Use Pinterest for Marketing and Lead Generation
Pinterest for Brands: How to Use Pinterest for Marketing and Lead GenerationVincent Ng
 
San Francisco Gift Show: Social Media: What you should Know, Why should you c...
San Francisco Gift Show: Social Media: What you should Know, Why should you c...San Francisco Gift Show: Social Media: What you should Know, Why should you c...
San Francisco Gift Show: Social Media: What you should Know, Why should you c...Creative Business Consulting Group
 

En vedette (6)

An O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraph
An O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraphAn O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraph
An O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraph
 
Pinterest yes!
Pinterest yes!Pinterest yes!
Pinterest yes!
 
Webinar promotions best practices 2 - 11-15-12
Webinar   promotions best practices 2 - 11-15-12Webinar   promotions best practices 2 - 11-15-12
Webinar promotions best practices 2 - 11-15-12
 
Pinterest for Brands: How to Use Pinterest for Marketing and Lead Generation
Pinterest for Brands: How to Use Pinterest for Marketing and Lead GenerationPinterest for Brands: How to Use Pinterest for Marketing and Lead Generation
Pinterest for Brands: How to Use Pinterest for Marketing and Lead Generation
 
San Francisco Gift Show: Social Media: What you should Know, Why should you c...
San Francisco Gift Show: Social Media: What you should Know, Why should you c...San Francisco Gift Show: Social Media: What you should Know, Why should you c...
San Francisco Gift Show: Social Media: What you should Know, Why should you c...
 
Online all the time slideshare version
Online all the time   slideshare versionOnline all the time   slideshare version
Online all the time slideshare version
 

Similaire à Two variable linear model

Similaire à Two variable linear model (20)

Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
Straight lines
Straight linesStraight lines
Straight lines
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and Application
 
Vectouurs
VectouursVectouurs
Vectouurs
 
Vectouurs
VectouursVectouurs
Vectouurs
 
Powerpoint2.reg
Powerpoint2.regPowerpoint2.reg
Powerpoint2.reg
 
Bruja de Agnesi
Bruja de AgnesiBruja de Agnesi
Bruja de Agnesi
 
Eigenaxes
EigenaxesEigenaxes
Eigenaxes
 
Orthogonal porjection in statistics
Orthogonal porjection in statisticsOrthogonal porjection in statistics
Orthogonal porjection in statistics
 
ML-UNIT-IV complete notes download here
ML-UNIT-IV  complete notes download hereML-UNIT-IV  complete notes download here
ML-UNIT-IV complete notes download here
 
Econometrics assignment help
Econometrics assignment helpEconometrics assignment help
Econometrics assignment help
 
Econometrics- lecture 10 and 11
Econometrics- lecture 10 and 11Econometrics- lecture 10 and 11
Econometrics- lecture 10 and 11
 

Dernier

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Dernier (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Two variable linear model

  • 1. Chapter 1 The Two Variable Linear Model 1.1 The Basic Linear Model The goal of this section is to build a simple model for the non-exact relationship between two variables Y and X, related by some economic theory. For example, consumption and income, quantity consumed and price, etc. The proposed model: Yi = α + βXi + ui , i = 1, . . . , n (1.1) where α and β are unknown parameters which are the purpose of the estimation. What we will call ‘data’ are the n realizations of (Xi , Yi ). We are abusing notation a bit by using the same letters to refer to random variables and their realizations. ui is an unobserved random variable which represents the fact that the relationship be- tween Y and X is not exactly linear. We will momentarily assumet that ui has expected value zero. Note that if ui = 0, then the relationship between Yi and Xi would be exactly linear, so it is the presence of ui what breaks this exact nature of the relationship. Y is usu- ally reffered to as the explained or dependent variable, X is the explanatory or independent variable. We will refer to ui as the ‘error term’, which is a terminology more appropriate in the experimental sciences, where a cause x (say the dose of a drug) is administered to different subjects and then an effect y is measured (say, body temperature). In this case ui might be a measurement error due to the erratic behavior of a measurement instrument (for example, a thermometer). In a social science like economics, ui represents a broader notion of ‘ignorance’ that represents whatever is not observed (by ignorance, ommision, etc.) that affects y besides x. [ FIGURE 1: SCATTER DIAGRAM ] The first goal will be to find reasonable estimates for α and β based solely on the data, that is (Xi , Yi ), i = 1, . . . , n. 1
  • 2. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 2 1.2 The Least Squares Method ˆ Let us denote with α and β the estimates of α and β in the simple linear model. Let us ˆ also define the following quantities. The first one is an estimate of Y : ˆ ˆ ˆ Yi ≡ α + βXi Intuitively, we have replaced α and β by its estimates, and treated ui as if the relationship were exactly linear, i.e., as if ui were zero. This will be undersood as an estimate of Yi . Then it is natural to define a notion of estimation error as follows: ˆ ei ≡ Yi − Yi which measures the difference between Yi and its estimate. ˆ A natural goal is to find α and β so as ei ’s are ‘small’ in some sense. It is interesting to ˆ see how the problem works from a graphical perspective. Data will correspond to n points scattered in a (X, Y ) plane. The presence of a linear relationship like (1.1) is consistent with points scatterd around an imaginary straight line. Note that if ui where indeed zero, all points will lie along the same line, consistent with an exact linear relationship. As mentioned above, it is the presence of ui what breaks this exact relationship. ˆ Now note that for any given values of α and β, the points determined by the fitted ˆ model: ˆ ˆ ˆ Y ≡ α + βX ˆ correspond to a line in the (X, Y ) plane. Hence different values of α and β correspond ˆ to different estimated lines, which implies that choosing particular values is equivalent to choosing a specific line on the plane. For the i-th observation, the estimation errors ei can ˆ be seen graphically as the vertical distance between the points (Xi , Yi ) and (Xi , Yi ), that ˆ is, between (Xi , Yi ) and the fitted line. So, intuitively, we want values of α and β so as the ˆ fitted line they induce passes as close as possible to all the points in the scatter so errors are as small as possible. [ FIGURE 2: SCATTER DIAGRAM WITH ‘CANDIDATE’ LINE] Note that if we had only two observations, the problem has a very simple solution, and ˆ reduces to finding the only two values of α and β that make estimation errors exactly equal ˆ to zero. Graphically, this is possible since this is equivalent to finding the only straight line that passes through the two observations available. Trivially, in this extreme case all estimation errors will be zero. The more realistic case appears when we have more than two observations, not all of them lying on a single line. Obviously, a line cannot pass through more than two non- aligned points, so we cannot make all errors equal to zero. So now the problem is to find ˆ values of α and β that determine a line that passes the closest as posible to all the points, ˆ so estimation errors are, in the aggregate, small. For this we need to introduce a criterion
  • 3. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 3 of what do we mean by the line being close or far from the points. Let us define a penalty function, which consists in adding all the estimation errors squared, so as positive and ˆ negative errors matter alike. For any α and β, this will give us an idea of how large is the ˆ aggregate estimation error: n α ˆ SSR(ˆ , β) = e2 = i ˆ ˆ (Yi − α − βXi )2 i=1 SSR stands for sum of squared residuals. Note that given the observations Yi and Xi , ˆ ˆ this is a function that depends on α and β, that is, different values of α and β correspond ˆ ˆ to different lines that pass through the data points, implying different estimation errors. It ˆ is now natural to look for α and β so as to make this aggregate error as small as possible. ˆ The values of α and β ˆ ˆ that minimize the sum of squared residuals are: ¯ ¯ Xi Yi − nY X ˆ β= Xi ¯ 2 − nX 2 and ¯ ˆ¯ α = Y − βX ˆ which are known as the least squares estimators of β and α. Derivation of the Least Squares Estimators The next paragraphs show how to obtain these estimators. Fortunately, it is easy to α ˆ show that SRC(ˆ , β) is globally concave and differentiable, so first order conditions for a local minimum are: α ˆ ∂SRC(ˆ , β) = 0 ∂α ˆ α ˆ ∂SRC(ˆ , β) = 0 ˆ ∂β The first order condition is: ∂ e2 = −2 ˆ ˆ (Yi − α − βXi ) = 0 (1.2) ∂α ˆ Dividing by minus 2 and distributing the summations: α ˆ Yi = nˆ + β Xi (1.3) This last expression is very important, and we will return to it frequently. From the second first order condition:
  • 4. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 4 ∂ e2 = −2 ˆ ˆ Xi (Yi − α − βXi ) = 0 (1.4) ˆ ∂β Dividing by -2 and distributing the summations: Xi Yi = α ˆ ˆ Xi + β 2 Xi (1.5) ˆ ˆ (1.3) and (1.5) form a system of two linear equations with two unknowns (α y β) known as the normal equations. Dividing (1.3) by n and solving for α we get: ˆ ¯ ˆ¯ α = Y − βX ˆ (1.6) Replacing in (1.5): Xi Yi ¯ ˆ¯ = (Y − β X) ˆ Xi + β Xi 2 Xi Yi ¯ = Y ˆ¯ Xi − β X ˆ Xi + β Xi 2 ¯ Xi Yi − Y Xi ˆ = β ¯ Xi 2 − X Xi ¯ Xi Yi − Y Xi ˆ β= Xi 2−X ¯ Xi ¯ Note that: X = Xi /n then ¯ Zi = Zn. Replacing, we get: ¯ ¯ Xi Yi − nY X ˆ β= (1.7) Xi ¯ 2 − nX 2 ¯ ¯ It will be useful to adopt the following notation. xi = Xi − X, and yi = Yi − Y , so lowercase letters denote the observations as deviations from their sample means. Using this notation: xi yi = ¯ ¯ (Xi − X)(Yi − Y ) = ¯ ¯ ¯¯ (Xi Yi − Xi Y − XYI + X Y ) = ¯ Xi Yi − Y ¯ Xi − X ¯¯ Yi + nX Y = ¯ ¯ ¯¯ ¯¯ Xi Yi − nY X − nX Y + nX Y = ¯ ¯ Xi Yi − nY X
  • 5. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 5 corresponds to the numerator of (1.7). Making a similar operation in the denominator of (1.7) we get the following alternative expression for the least squares estimate of β: ˆ xi yi β= x2i [ FIGURE 3: SCATTER DIAGRAM AND OLS LINE ] 1.3 Algebraic Properties of Least Squares Estimators By algebraic properties of the estimator we mean those that are a direct consequence of the minimizacion process, stressing the difference with statistical properties, which will be studied in the next section. • Property 1: ei = 0 From the first normal equation (1.2), dividing by minus 2 and replacing by the definition of ei we easily verify that as a consequence of minimizing the sum of squared residuals, the sum of the residuals, and consequently their average, is equal to zero. • Property 2: Xi ei = 0. This can be checked by dividing by minus 2 in the second normal equation (1.4). The covariance between X and e is given by: 1 ¯ Cov(X, e) = (Xi − X)(ei − e) ¯ n−1 1 ¯ ¯¯ = X i ei − e ¯ Xi − X ei + Xe n−1 1 = X i ei n−1 since from the previous property ei and hence e are equal to zero. Then, this ¯ property says that as a consequence of using the method of least squares the sample covariance between the explanatory variable X and the error term e is zero, or, which is the same, the residuals are linearly unrelated to the explanatory variable. ˆ ˆ ˆ • Property 3: The estimated regression line corresponds to the function Y (X) = α+ βX ˆ as parameters, so as Y is a function that depends on X. Consider where e take α and β ˆ ˆ ¯ what happens when we evaluate this function at X, the mean of X: ˆ ¯ ˆ ˆ¯ Y (X) = α + β X But from (1.6): ˆ ˆ¯ ¯ α + βX = Y ˆ ¯ ¯ Then Y (X) = Y , this is, the estimated regression line by the method of least squares passes through the point of means.
  • 6. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 6 • Property 4: Relationship between regression and correlation: Remember that the sample correlation coefficient between X and Y for a sample of n observations (Xi , Yi ), i = 1, 2, . . . , n is defined as: Cov(X, Y ) rXY = SX SY ˆ The following result establishes the relationship between rXY and β. ˆ xi yi β = x2i xi yi = x2 i x2 i 2 yi xi yi = x2 i x2 i 2 yi 2 √ xi yi yi / n = √ x2 i 2 yi x2 / n i ˆ SY β=r SX ˆ If r = 0 then β = 0.Note that if both variables have the same sample variance, then ˆ the correlation coefficient is equal to the regression coefficient β. We can also see ˆ is not invariant to changes in scales or unit that, unlike the correlation coefficient, β of measurement. ˆ ˆ • Property 5: The sample means of Yi and Yi are the same. By definition, Yi = Yi + ei for i = 1, . . . , n. Then, summing for every i: Yi = ˆ Yi + ei and dividing by n: Yi ˆ Yi = n n since ei = 0 from the first order conditions. Then: ¯ ¯ ˆ Y =Y which is the desired result.
  • 7. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 7 ˆ ˆ ˆ • Property 6: β is a linear function of the Yi ’s. This is, β can be written as β = wi Yi , where the wi ’s are real numbers not all of them equal to zero. ˆ This is easy to prove. Let us start by writing β as follows: ˆ xi β= yi x2i and call wi = xi / x2 . Note that: i xi = ¯ (Xi − X) = ¯ Xi − nX = 0 which implies wi = 0. From the previous result: ˆ β = wi yi = ¯ wi (Yi − Y ) = ¯ wi Yi − Y wi = wi Yi which gives the desired result. This does not have much intuitive meaning so far, but it will be a useful for later results. 1.4 The Two-Variable Linear Model under the Classical As- sumptions Yi = α + βXi + ui , i = 1, . . . , n In addition the the linear relationhips beteween Y and X we will assume: 1. E(ui ) = 0, i = 1, 2, . . . , n. ‘On average’ the relationship between Y and X is linear. 2. V ar(ui ) = E[(ui − E(ui ))2 ] = Eu2 = σ 2 i = 1, 2, . . . , n. The variance of the error i term is constant for all observations. We will say that the error term is homoskedastic. 3. Cov(ui , uj ) = 0 ∀i = j. The error term for an observation i is not linearly related to the error term of any other different observation j. If variables are measured over time, i.e., i = 1980, 1981 . . . , 1997 we will say that there is no autocorrelation. In general, we will say that there is no serial correlation. Note that since E(ui ) = 0, assuming Cov(ui , uj ) = 0 is equivalent to assuming E(ui uj ) = 0. 4. The values of Xi are non-stochastic and not all of them equal.
  • 8. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 8 The classical assumptions provide a basic probabilistic structure to study the linear model. Most assumptions are of a pedagogic nature and we will study later on how they can be relaxed. Nevertheless, they provide a simple framework to explore the nature of least squares estimator. 1.5 Statistical Properties of Least Squares Estimators Actually, the problem is to find good estimates of α, β and σ 2 . The previous section presents estimates of the first two based on the principle of least squares so, trivially, these estimates are ‘good’ in the sense that they minimize certain notion of fit: they make the sum of squared residuals as small as possible. It is relevant to remark that in obtaining the least squares estimators we have made no use of the classical assumptions described above. Hence, the natural step is to explore whether we can deduce additional properties satisfied by the least squares estimator, so we can say that it is good in a sense that goes beyond that implicit in the least squares criterion. The following are called statistical properties since they arise as a consequence of the statistical structure of the model. We will use repeatedly the following expressions for the LS estimators: ˆ xi yi β= x2i ¯ ˆ¯ α = Y − βX ˆ ˆ We will first explore the main properties of β in detail, and leave the analysis of α as ˆ ˆ exercises. The starting conceptual point is to see that β depends explicitely on the Yi ’s ˆ which, in turn, depend on the ui ’s which are, by construction, random variables. Then β is a random variable and then it makes sense to talk about its moments (mean and variance, for example) and its distribution. It is easy to verify that: yi = xi β + u∗ i where u∗ = ui − u, and, according to the classical assumptions, E(u∗ ) = 0 and, consequently, i ¯ i E(yi ) = xi β. This is known as the classical two-variables linear model in deviations form the means. ˆ ˆ • β is an unbiased estimator, that is: E(β) = β To prove the result, from the linearity property of the previous section ˆ β = wi yi ˆ E(β) = wi E(yi ) (wi ’s are non-stochastic) = wi xi β
  • 9. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 9 = β wi xi = β x2 /( i x2 ) i = β ˆ • The variance of β is σ 2 / x2 i ˆ From the linearity property, β = wi Yi , then ˆ V (β) = V wi Yi Now note two things. First: V (Yi ) = V (α + βXi + ui ) = V (ui ) = σ 2 since Xi is non-stochastic. Second, note that E(Yi ) = α + βXi , so Cov(Yi , Yj ) = E [(Yi − E(Yi ))(Yj − E(Yj ))] = E(ui uj ) = 0 by the no serial correlation assumption. Then V ( wi Yi ) is the variance of (weighted) sum of uncorrelated terms. Hence ˆ V (β) = V wi Yi 2 = wi V (Yi ) = σ2 2 wi 2 = σ2 (x2 )/ i x2 i = σ2 / 2 xi ˆ • Gauss-Markov Theorem: under the classical assumptions, β, the LS estimator of β, has the smallest variance among the class of linear and unbiased estimators. More formally, if β ∗ is any linear and unbiased estimator of β then: ˆ V (β ∗ ) ≥ V (β) The proof of a more general version of this result will be postponed until Chapter 3. Discussion: BLUE, best does not mean good, we want minimum variance unbiased (without ‘linear’), ‘linear’ is not an interesting class, etc. If we drop any assumption, the OLS estimate is no longer BLUE. This justifies the use of OLS when all the asumptions are correct.
  • 10. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 10 Estimation of σ 2 So far we have concentrated the analysis on α and β. As an estimate for σ 2 we will propose: e2 S2 = i n−2 We will later show that S 2 provides and unbiased estimator for σ 2 . 1.6 Goodness of fit After estimating the parameters of the regression line, it is interesting to check how well does the estimated model fit the data. We want a measure of how well does the fitted line represent the observations of the variables of the model. To look for such measure of goodness of fit, we start from the definition of fitted value ˆ ei = Yi − Yi , solve for Yi and substract in both members the sample mean of Yi to obtain: ¯ Yi − Y ˆ ¯ = Yi − Y + ei y i = y i + ei ˆ ¯ ¯ ˆ using the notation defined before and noting that from Property 4, Y = Y . Taking the square of both sides and summing over all the observations: yi = (ˆi + ei )2 2 y ˆ2 = yi + ei + 2ˆi ei y 2 yi = ˆ2 yi + e2 + 2 i y i ei ˆ The next step is to show that yi ei = 0: ˆ y i ei = ˆ α ˆ (ˆ + βXi )ei = α ˆ ˆ ei + β Xi ei = 0+0 from the first order conditions. Then we get the following important decomposition: 2 yi = yi 2 + ˆ e2 i T SS = ESS + RSS This is a key result that indicates that when the we use the least squares method, the total variability of the dependent variable (TSS) around its sample mean can be decomposed ˆ as the sum of two factors. The first one corresponds to the variability of Y (ESS) and represents the variability explained by the fitted model. The second term represents the variability not explained by the model (RSS), associated to the error term.
  • 11. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 11 For a given model, the best situation arises when errors are all zero, in which case the total variability (TSS) conincides with the explained varaibility (ESS). The worst case corresponds to the situation in which the fitted model does not explain anything of the total variability, in which case TSS coincides with RSS. From this observation, it is natural to suggest the following goodness of fit measure, known as R2 , or coefficient of determination: SCE SCR R2 = =1− SCT SCT It can be shown (we will do it in the exercises) that R2 = r2 . Consequently, 0 ≤ R2 ≤ 1. When R2 = 1 |r| = 1, which corresponds to the case in which the relationship between Y and X is exactly linear. On the other hand, R2 = 0 is equivalent to r = 0, which corresponds to the case in which Y and X are linearly unrelated. It is interesting to note that T SS does not depend on the estimated model, that is, it does not depend on β nor ˆ α. Then, if β ˆ ˆ and α are choosen so as to minimize SSR then they automatically maximize ˆ R2 . This implies that, for a given model, the least squares estimate maximizes R2 . The R2 is, arguably, the most used and abused measure of quality of a regression model. A detailed analysis of the extent to which a high R2 can be taken as representative of a ‘good’ model will be undertaken in Chapter 4. 1.7 Inference in the two-variable linear model The methods discussed so far provide reasonably good point estimates of the parameters of interest α, β and σ 2 but usually we will be interested in evaluating hypotheses involving the parameters, or constructing confidence intervals for them. For example, consider the case of a simple consumption function where consumption is specified as a simple linear function of income. We could be interested in evaluating whether the marginal propensity to consume is equal to, say, 0.75, or that autonomous consumption is equal to zero. In general terms, a hypothesis about a parameter of the model is a conjecture about it, that can be either false or true. The central problem is that in order to check whether such statement is true or false we do not have the chance to observe such a parameter. Instead, based on the available data, we have an estimate of it. As an example, suppose we are interested in evaluating the, rather strong, null hypothesis that income is not an explanatory factor of consumption, against the hypothesis that it is a relevant factor. In our simple setup this corresponds to H0 : β = 0 against HA : β = 0. The logic we will use is the following: if the null hypothesis were in fact true β would be exactly zero. Realizations ˆ ˆ of β can potentially take any value, since β is, by construction, a random variable. But if βˆ is a ‘good’ estimator of β, when the null hypothesis is true it should take values close ˆ to zero. On the other hand, if the null hypothesis were false, the realizations of β should be significantly different from zero. Then, the procedure consists in computing β ˆ from the data, and reject the null if the obtained value is significantly different from zero, or accept otherwise.
  • 12. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 12 Of course, the central concept behind this procedure lies in specifying what do we mean ˆ by ‘very close’ or ‘very far’, given that β is a random variable. More specifically, we need to know the distribution of β ˆ under the null hypothesis so we can define precisely the notion of ‘significantly different from zero’. In this context such a statement is necessarily probabilistic, that is, we will take as the rejection region a set of values that lie ‘far away’ from zero, or, a set of values that under the null hypothesis appear with very low probability. The properties discussed in the previous section are informative about certain moments ˆ or α (for example, their means and variances) but they are not enough for the purposes of β ˆ of knowing their distrubutions. Consequently, we need to introduce an additional assump- tion. We will assume that ui is normally distributed, for i = 1, . . . , n. Given that we have already assumed that ui has zero mean and constant variance equal to σ 2 , we have: ui ∼ N (0, σ 2 ) Given that Yi = α + βXi + ui and that the Xi ’s are non-stochastic, we immediately see that the Yi ’s are also normally distributed since linear transformations of normal ran- dom variables are also normal. In particular, given that the normal distibution can be characterized by its mean and variance only, we get: Yi ∼ N (α + βXi , σ 2 ) ˆ , for every i = 1 . . . , n. In a similar fashion β is also normally distributed since by Property 1 it is a linear combination of the Yi ’s, that is: ˆ β ∼ N (β, σ 2 / x2 ) i If σ 2 were known we could use this result to test simple hypothesis like: Ho : β = βo vs. HA : β = βo ˆ Substracting from β its expected value and dividing by its standard deviation we get: ˆ β − βo z= ∼ N (0, 1) σ/ x2 i Hence, if the null hypothesis is true, z should take values that are small in absolute value, and large otherwise. As you should remember from a basic statistics course, this is acomplished by defining a rejection region and an acceptance region as follows. The acceptance region includes values that lie close to the one corresponding to the null hypothesis. Let c < 1 and zc be a number such that: P r(−zc ≤ z ≤ zc ) = 1 − c Replacing z by its definition:
  • 13. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 13 P r βo − zc σ/ ˆ x2 ≤ β ≤ βo + zc σ/ x2 =1−c i i Then the acceptance region is given by the interval: βo ± zc (σ/ x2 ) i ˆ so we accept the null hypothesis if the observed realization of β lies within this interval and reject otherwise. The number c is specified in advance and it is usually a small number. It is called the significance of the test. Note that it gives the probability that we reject the null hypohtesis when it is correct. Under the normality assumptions, the value zc can be easily obtained from a table of percentiles of the standard normal distribution. As you should also remember from a basic statistics class, a similar logic can be applied to construct a confidence interval for β0 . Note that: ˆ P r β − zc (σ/ ˆ x2 ) ≤ βo ≤ β + zc (σ/ x2 ) = 1 − c i i Then a 1 − c confidence interval for β0 will be given by: ˆ β ± zc σ/ x2 i The practical problem with the previous procedures is that they require that we know σ 2 , which is usually not available. Instead, we can compute its estimated version S 2 . Define t as: ˆ β−β t= √ S/ x2 t is simply z where we have replaced σ 2 by S 2 . A very important result is that by doing this replacement we have: t ∼ tn−2 that is, the ‘t-statistic’ has the so-called ‘t-distribution with n − 2 degrees of freedom’. Hence, when we use the estimated version of the variance we obtain a different distribution for the statistic used to test simple hypotheses and construct confidence intervals. Consequently, applying once again the same logic, in order to test the null hypothesis Ho : β = βo against HA : β = βo we use the t-statistic: ˆ β − βo t= ∼ tn−2 S/ x2 i
  • 14. CHAPTER 1. THE TWO VARIABLE LINEAR MODEL 14 and a 1 − c confidence interval for β0 will be given by: ˆ β ± tc (S/ x2 ) i where now tc is a percentile of the ‘t’ distribution with n − 2 degrees of freedom, which is usually tabulated in basic statistics and econometrics textbooks. An important particular case is the insignificance hypothesis, that is Ho : βo = 0 against HA : β0 = 0. Under the null X does not help explain Y , and under the alternative, X is linearly related to Y . Replacing βo by 0 above we get: ˆ β tI = ∼ tn−2 S/ x2 i which is usually reported as a standard outcome in most regression packages. Another alternative to check for the significance of the linear relationship is to look at how large is the explained sum of squares ESS. Recall that if the model has an intercept we have that: T SS = ESS + RSS If there is no linear relationship between Y and X, ESS should be very close to zero. Consider the following statistic, which is just a ‘standardized’ version of the ESS: ESS F = RSS/(n − 2) It can be shown that under the normality assumption, F has the F − distribution with 1 degree of freedom in the numerator, and n − 2 degrees of freedom in the denominator, which is usually labeled as F (1, n − 2). Note that if X does not help explain Y in a linear sense, ESS should be very small, which would make F very small. Then, we should reject the null hypothesis that X does not help explain Y is the F statistic computed from the data takes a large value, and accept otherwise. Note that by definition R2 = ESS/T SS = 1 − RSS/T SS. Divide both the numerator of the F statistic by T SS. Solving for ESS and RSS and replacing above we can write the F statistic in terms of the R2 coefficient as: R2 F = (1 − R2 )/(n − 2) Then, the F test is actually looking at whether the R2 is significantly high. As it is expected, there is a close relationship between the F statistic and the ‘t’ statistic for the insignificance hypothesis (tI ). In fact, when there is no linear relationship between Y and X, ESS is zero, or β0 = 0. In fact, it can be easily shown that: F = t2 I We will leave the proof as an excercise.