SlideShare une entreprise Scribd logo
1  sur  82
Dummy Variable Models
“ Using Dummy Variables in Wage Discrimination Cases” Multiple Regression  Sandy:  pages 603 - 613 Also  read  paper  titled:
Are Male Nurses Discriminated Against? male  nurses  0 female  nurses Years of experience, X i W f _  4 ^ W m _  3 ^ ~ m W  3 ~ W f ~  4 ~   ~ adjusted  for  experience  not  adjusted  for  experience  o o o o o o o o o o o o + + + + + + + + + + + + + + + + + + + + + + + + + o o o   ~
I.  Dummy Variables -    Adjusting the  intercept .   Adjusting the  slope .   Adjusting both  intercept   and  slope .
Intercept Dummy Variables Dummy variables are binary (0,1) D t  = 1 if  red  car, D t  = 0 otherwise. y t   =   1   +   2 X t  +   3 D t  +  e t y t   =  speed of car in miles per hour X t   =  age of car in years Police:  red  cars travel faster . H 0 :    3  = 0 H 1 :    3  > 0
y t   =   1   +   2 X t  +   3 D t  +  e t red  cars :  y t   =  (  1  +   3 ) +   2 X t  +  e t   other cars :  y t   =   1  +   2 X t  +  e t   y t X t miles per  hour age in years 0  1  +   3  1  2  2 red  cars other cars
Slope Dummy Variables y t   =   1   +   2 X t  +   3 D t X t  +  e t y t   =   1  + (  2  +   3 )X t  +  e t   y t   =   1  +   2 X t  +  e t   y t X t value of porfolio years 0  2  +   3  1  2 stocks bonds Stock portfolio: D t  = 1  Bond portfolio: D t  = 0  1  = initial investment
Different Intercepts & Slopes y t   =   1   +   2 X t  +   3 D t   +   4 D t X t  +  e t y t   =  (  1  +   3 ) + (  2  +   4 )X t  +  e t y t   =   1  +   2 X t  +  e t   y t X t harvest weight of  corn rainfall  2  +   4  1  2 “ miracle” regular “ miracle” seed: D t  = 1  regular seed: D t  = 0   1  +   3
y t  =   1  +   2  X t  +   3   D t   + e t  2  1 +   3  2  1 y t X t Men Women 0 y t   =   1  +   2  X t   + e t For  men    D t   = 1. For  women    D t  = 0. years of experience y t   = (  1 +   3 ) +   2  X t   + e t wage rate . . Testing for discrimination in starting wage H 0 :    3   =  0   H 1 :    3   >  0
y t  =   1  +   5   X t  +   6   D t  X t  + e t  5  5  +  6  1 y t X t Men Women 0 y t   =   1   + (  5  +  6   )X t   + e t y t  =   1   +   5  X t   + e t For men  D t  = 1. For women  D t  = 0. Men and women have the same  starting  wage,   1  , but  their  wage rates increase at different  rates  (diff.=   6   ).  6   >      means that  men’s wage rates are increasing  faster  than  women's wage rates. years of experience wage rate
y t  =   1  +   2  X t  +   3  D t  +   4  D t  X t  + e t  1  +   3  1  2  2  +   4 y t X t Men Women 0 y t  = (  1  +   3 ) + (  2  +   4 ) X t  + e t y t  =   1  +   2  X t  + e t Women are given a higher starting wage,   1   ,  while men get the lower starting wage,   1  +   3   , (  3   <  0   ).  But, men get a faster rate of increase in their wages,   2  +   4   , which is higher than the rate of increase for women,   2  , (since   4   >  0  ). years  of  experience An  Ineffective  Affirmative  Action  Plan women are started at a higher wage. Note : (  3   <  0  ) wage rate
Testing Qualitative Effects ,[object Object],[object Object],[object Object],[object Object]
H 0 :     vs  1 :     H 0 :     vs  1 :     Y t     1   2 X t   3 D t   4 D t X t b   3 Est . Var b  3 ˜ t n  4 b    4 Est . Var b  4 ˜ t n  4 men:  D t  = 1 ;  women:  D t   = 0   Testing for discrimination in starting wage. Testing for discrimination in wage increases. intercept slope  e t
Why NOW wants one-sided test and Chauvinist Industries wants two-sided.
Are Two  Regressions  Equal? y t  =   1  +   2  X t  +   3   D t  +   4   D t  X t  + e t variations of “The Chow Test”  I.  Assuming equal variances (pooling): men:  D t  = 1 ;  women:  D t   = 0  H o :   3  =   4  = 0  vs.  H 1 : otherwise y t  = wage rate This model assumes equal wage rate variance. X t  = years of experience
Testing    H o :           H 1   :  otherwise  and SSE R   y t  b 1  b 2 X t  2 t  1 T  SSE U   y t  b 1  b  X t  b  D t  b  D t X t  2 t  1 T   SSE R  SSE U   2 SSE U   T  4   F T  4  intercept and slope
y t  =   1  +   2  X t  + e t II.  Allowing for unequal variances: y tm  =   1  +   2  X tm  + e tm y tw  =   1  +   2  X tw  + e tw Everyone: Men only: Women only: SSE R Forcing men and women to have same   1 ,   2 . Allowing men and women to be different. SSE m SSE w where  SSE U  =  SSE m  +   SSE w F = (SSE R     SSE U )/J SSE U  /(T  K) J = # restrictions K=unrestricted coefs.  (running three regressions) J = 2  K = 4
Polynomial Terms y t  =   1  +   2  X   t  +   3   X 2 t   +   4  X 3 t   + e t Linear in parameters but nonlinear in variables: y t  = income;  X t  = age Polynomial Regression y t X   t People retire at different ages or not at all. 90 20 30 40 50 60 80 70
y t  =   1  +   2  X   t  +   3   X 2 t   +   4  X 3 t   + e t y t  = income;  X t  = age Polynomial Regression Rate income is changing as we age : Slope changes as  X   t  changes.  y t  X t =   2  + 2   3   X   t   + 3   4  X 2 t
Continuous Interaction y t  =   1  +   2   Z t   +   3  B t  +   4   Z t  B t   + e t Exam grade = f(sleep: Z t   , study time: B t ) Sleep and study time do not act independently. More study time  will be more effective when combined with  more sleep  and less effective when combined with  less sleep .
Your mind sorts things out while you sleep (when you have things to sort out.) y t  =   1  +   2   Z t   +   3  B t  +   4   Z t   B t   + e t Exam grade = f(sleep: Z t   , study time: B t ) Your studying is  more effective with more sleep . continuous interaction   y t  B t =   2  +   4  Z t  y t  Z t =   2  +   4  B t
y t  =   1  +   2   Z t   +   3  B t  +   4   Z t   B t   + e t Exam grade = f(sleep: Z t   , study time: B t ) If  Z t  +  B t  = 24 hours,  then  B t  = (24     Z t ) y t  =   1 +   2   Z t   +  3 (24     Z t )   +  4   Z t  (24     Z t )   + e t y t  = (  1 + 24    3 ) + (  2   3 + 24   4 ) Z t       4 Z 2 t   + e t y t  =   1  +   2   Z t   +   3  Z 2 t   + e t Sleep needed to maximize your exam grade : where   2  > 0  and    3  < 0  y t  Z t =   2  + 2  3  Z t   = 0  2  3 Z t   =
Multicollinearity Correlation among the “ independent” variables. Note: They are independent of the error term, and not of one another.
Let  yi  represent  the  ith person's wage rate and  Xi  represent their months of work experience in the equation:   yi = b1 + b2 Xi + ei  (1) b1 = intercept (starting wage) b2 = increase in the person's    wage for each additional month    of work experience.  ei = error term with mean zero    and estimated variance  s2.
yi  =  b1 + b2 Xi + b3  Mi  + b4  Fi  + ei  (2) Fi  = 1  if   female   Fi   = 0 if  male . Mi  = 1  if  male  Mi  = 0  if   female .
yi  =  b1 + b2 Xi + b3   Mi   + b4  Fi  + ei  (2)   Unfortunately this equation contains   an   underidentified   set of parameters   (b1, b3, and b4) and cannot be estimated   without some  restriction    on the coefficients.
To see this point, separate out the   men's  equation implied by equation (2)  from the  women's  equation.   For the  men's  equation  Mi  =1 and  Fi  =0.    For  men , equation (2) becomes:   yi  =  (b1 + b3) + b2 Xi + ei  (3) yi  =  b1 + b2 Xi + b3  Mi  + b4  Fi  + ei  (2)
For  women ,  Mi  =0 and  Fi  =1.   For  women , equation (2) becomes:   yi  =  (b1 + b4) + b2 Xi + ei  (4)
Unfortunately, although we get estimates  of the intercepts (b1 + b3) and (b1 + b4),  the value of b1  cannot be separated    from the values of b3 and b4.   Some  restriction  is needed   to achieve  identification     of b1, b3 and b4.
One such restriction is b1 = 0.     We can drop the original intercept term,   b1, since  men  and  women  already  have their own intercept terms,    b3  and  b4 , respectively.
Underidentification of equation (2)   can also be expressed in matrix terms.    First, rewrite equation (2) putting the   explanatory variables in a row vector   multiplied by the corresponding column   vector of their respective coefficients: y i    1  X i  M i  F i     2  3  4    i   5  1
This only represents the   ith  observation where i = 1, ..., n.   To represent the entire set   of n observations at once, we need to  &quot;pull the window shade down&quot; as follows: y 1 y 2 M y n  1 X 1 M 1 F 1 1 X 2 M 2 F 2 M M M M 1 X n M n F n  1  2  3  4   1  2 M  n (6)
Equation (6) presents us with an X matrix  whose first column (the column of ones)  is an exact linear combination of the last  two columns (the M and F columns).  Since Mi is always zero when Fi is equal  to one and Mi is always one when Fi is  equal to zero, then it always holds  that Mi + Fi = 1.   Therefore, the first column is equal to the  sum of the last two columns.
Since Mi is always zero when Fi is equal  to one and Mi is always one when Fi is  equal to zero, then it always holds  that Mi + Fi = 1.  1 1 M 1  M 1 M 2 M M n  F 1 F 2 M F n ( 9 )
Equation (6) and, therefore,equation (2),  represent a case of perfect  multicollinearity .   This means that a restriction must be  introduced that drops one of these columns  out of the regression. One such restriction is  b1 = 0 ,  which means dropping the original intercept out of the regression model to  provide the following reduced model:   yi  =  b2 Xi  +  b3  Mi  +  b4  Fi   +  ei  (10) Now  men and women have separate intercepts and no common intercept is necessary.
yi = b2 Xi + b3  Mi  + b4  Fi  + ei b2 b3 b2 b4 yi Xi Male Female 0 yi  =  b3  +  b2 Xi  + ei yi  =  b4  +  b2 Xi  + ei For  males  Mi  = 1  and  Fi  = 0. For  females  Mi  = 0  and  Fi  = 1. Males  and  females  have  different starting  salaries ,  b3 > b4 , but  their salaries  increase  at  the  same  rate, b2.
y i  = b2 X i  + b3  M i  + b4  F i  + e i b2 b3 b2 b4 y i X i Male Female 0 y i   =  b3  +  b2 X i   + e i y i   =  b4  +  b2 X i   + e i For  males  Mi  = 1  and  Fi  = 0. For  females  Mi  = 0  and  Fi  = 1. Males  and  females  have  different starting  salaries ,  b3 > b4 , but  their salaries  increase  at  the  same  rate, b2. years of experience
y i  = b1 + b5  M i  X i  + b6  F i  X i  + e i b6 b5 b1 y i X i Male Female 0 y i   =  b1   +  b5 X i   + e i y i   =  b1   +  b6 X i   + e i For males  Mi = 1  and  Fi = 0. For females  Mi = 0  and  Fi = 1. Males and Females have the same  starting  salary  b1, but  their  salaries increase at different  rates  (  b5  vs.  b6  ). b5   >   b6   means that  men salaries  are increasing  faster  than  women's salaries. years of experience
y i  =  b3 M i  + b4 F i  + b5 M i  X i  + b6 F i  X i  + e i b3 b4 For males  Mi = 1  and  Fi = 0. For females  Mi = 0  and  Fi = 1. b6 b5 y i X i Male Female 0 y i   =  b3  +  b5 X i   + e i y i   =  b4  +  b6 X i   + e i Females start with a higher starting salary,  b4 ,  while men get the lower starting salary,  b3 . But, men get a faster rate of increase in their salaries,  b5 , which is higher than the rate of increase for females,  b6 .  (  b5  >  b6  ). years  of  experience Chauvinist Industries Affirmative Action Plan
y i  = b2 X i  + b3  M i  + b4  F i  + e i b2 b3 b2 b4 y i X i Male Female 0 y i   =  b3  +  b2 X i   + e i y i   =  b4  +  b2 X i   + e i For  males  Mi  = 1  and  Fi  = 0. For  females  Mi  = 0  and  Fi  = 1. Males  and  females  have  different starting  salaries ,  b3 > b4 , but  their salaries  increase  at  the  same  rate, b2. Back to our basic model: years of experience
Since under our null hypothesis  the raw score test statistic:    has a  mean    and a  variance ,    we can standardize    by subtracting the mean (zero)  and dividing by the standard deviation  (square root of the variance)  to get the standardized test statistic:   b 3 – b 4 Var ( b 3 – b 4 ) b 3 – b 4
To test the null hypothesis: Z  ( b   b   )  0 Var ( b    b   ) ~  ( 0 , 1 )
If the var iance of the y i ,  2 , is unknown , then Var ( b  3  b  4 ) is also unknown and must be estimated from the exp ression : Est . Var ( b  3  b  4 )  Est . Var ( b  3 )  Est . Var ( b  4 )  2 Est . Cov ( b  3 , b  4 )
Use the  sample variance  as an estimator of the  population variance :
The values for the following expression are obtained in practice from the  diagonal and  off-diagonal  elements of the  estimated variance-covariance matrix : Est . Var ( b  3  b  4 )  Est . Var ( b  3 )  Est . Var ( b  4 )  2 Est . Cov ( b  3 , b  4 )
y i  = b1 + b2 X i  + b3  M i b2 (b1 + b3) b2 b1 y i X i Male Female 0 y i   =  ( b1 + b3 ) +  b2  X i   y i   =  b1 +  b2  X i   Males  and  females  have  different starting  salaries ,  b3   >   0  , but  their salaries  increase  at  the  same  rate, b2. years of experience Alternative :  make women the default group ^ ^ ^
y i   =  b1 + b2 X i  + b3 M i  + b4 D i y i   =  (b1 + b3 + b4)  +  b2 X i y i   =  (b1 + b4)  +  b2 X i y i   =  (b1 + b3)  +  b2 X i y i   =  b1  +  b2 X i characteristic  dummy variables: male college grad: female college grad: male not a grad: female not a grad: ^ ^ ^ ^ ^
years of experience 0 X  i M-D  (male-degree) F-D  (female-degree) M-N  (male-no degree) F-N  (female-no degree) y i wage rate very restrictive assumption  y i   =  b1 + b2 X i  + b3 M i  + b4 D i b1 b1+b3 b1+b4 b1+b3+b4 very rigid !!! ^
Creating  Composite   Dummy Variables  ( vs.  characteristic  dummy variables )
Job:  Gender: Karnaugh map for  gender  vs. status of  job :  S I M 15 25 40 F 13 27 40 28 52 80 S =  supervisor I  =  individual men : women :
Occupation  vs.  Job  vs.  Gender Gender: Occupation: Job: C T U S I S I S I M 2 4 3 5 10 16 40 F 1 6 0 7 12 14 40 3 10 3 12 22 30 80 C = Computer T = Other Technical U = Untechnical
Karnaugh Map for  Occupation , Job  Status,  Gender , and  Degree  Status: Degree No Degree C T U S I S I S I D M 1 3 2 5 6 13 30 F 0 3 0 6 7 8 24 N M 1 1 1 0 4 3 10 F 1 3 0 1 5 6 16 3 10 3 12 22 30 80
composite  dummy variables: This defines combined ( instead of separate ) general characteristics. y i   =  b1 + b2 X i  + b3 MN i  + b4 FD i  + b5 MD i years of experience 0 X  i M-D  (male-degree) F-D  (female-degree) M-N  (male-no degree) F-N  (female-no degree) y i wage rate b1 b1 + b3 b1 + b4 b1 + b5 ^
Multiple  Regression Analysis value of  residential property ( buying a home )
A i  = bathrooms  X i  = sq. ft. living space H 0 :     vs. H 1 :       H 0 :     vs. H 1 :       ˆ  Y  i  b  1  b  2 X i  b  3 A i  b  4 A i X i b  3 Est . Var b  3 ˜ t n  4 b  4 Est . Var b  4 ˜ t n  4
Testing   Ho:       H1 :  otherwise   and SSE R   y i  b 1  b 2 X i  2 i  1 n  SSE U   y i  b 1  b  X i  b  A i  b  A i X i  2 i  1 n 
Sale  of House with  Bed and Bath Dummies 800  0  0  0  10.000 1000  0  0  1  20.000 1200  1  0  0  30.000 1500  1  0  0  40.000 1800  1  0  1  50.000 2000  1  0  1  60.000 2200  0  1  0  70.000 2500  0  1  0  80.000 3000  0  1  1  90.000 3500  0  1  1  100.000 PRICE = f ( SQFEET, D2BED, B3BED, A2BATH ) I.  II.  III.  IV.  PRICE (thousands) I.  SQFEET  =  square feet of living space II.  D2BED  =  dummy=1 if two-bedroom house III.  D3BED  =  dummy=1 if three-bedroom house IV.  A2BATH  =  dummy=1 if two-bathroom house
PRICE = f ( SQFEET, D2BED, B3BED, A2BATH ) Sale  of House with  Bed and Bath Dummies ANALYSIS OF VARIANCE SOURCE SUM-OF-SQUARES  DF  MEAN-SQ  F-RATIO  P REGRESSION  8191.943  4  2047.986  176.378  0.000 RESIDUAL  58.057  5  11.611 DURBIN-WATSON  D  STATISTIC:  2.216 FIRST ORDER  AUTOCORRELATION  COEFF:  - 0.153 DEP VAR:  PRICE  N:  10  MULTIPLE R: 0.996  SQUARED MULTIPLE R: 0.993 ADJUSTED SQUARED MULTIPLE R: 0.987  STD ERROR OF ESTIMATE:  3.40
PRICE = f ( SQFEET, D2BED, B3BED, A2BATH ) Sale  of House with  Bed and Bath Dummies DEP VAR:  PRICE   N:  10  MULTIPLE R:  0.996   SQUARED MULTIPLE R:  0.993 ADJUSTED SQUARED MULTIPLE R:  0.987 STD ERROR OF ESTIMATE:  3.40 VARIABLE  COEFF  STD ERR  T  P(2-TAIL) INTERCEPT   - 6.482  4.112  -1.576  0.176  SQFEET   0.021  0.005  3.958  0.011 D2BED   14.662  4.871  3.010  0.030 D3BED   29.803  10.575  2.818  0.037 A2BATH   4.883  3.953  1.235  0.272 ( for 1,000 square feet:  21 - 6.482 = 14.518  or  $14,518 )
VARIABLE  COEFF  STD ERR  T  P(2-TAIL) INTERCEPT   - 6.482  4.112  -1.576  0.176  SQFEET   0.021  0.005  3.958  0.011 D2BED   14.662  4.871  3.010  0.030 D3BED   29.803  10.575  2.818  0.037 A2BATH   4.883  3.953  1.235  0.272 for 1,000 square feet:  21 - 6.482 = 14.518  or  $14,518  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],add bath and 2 bedrooms: 14,518 + 4,883 + 29,803 = $49,204 Regression Analysis of Sale of Residential Property
Sales Value of Residential Property y = sales value of the property (dollars) X = square feet of living space D1 =dummy vble for  one bedroom  home D2 =dummy vble for  two bedroom  home D3 =dummy vble for  three bedroom  home A1 =dummy vble for  one bathroom  home A2 =dummy vble for  two bathroom  home For a one-bedroom, one-bathroom home,  such that D2=0, D3=0, and A2=0, we have: y i  b 1  b 2 X i  b 3 D 2 i  b 4 D 3 i  b 5 A 2 i ^ y i  b 1  b 2 X i 1 bedroom , 1 bathroom ^
Sales Value of Residential Property For a 2-bedroom, 1-bathroom home,  we have  D2=1, D3=0, and A2=0 ^ ^ y i  b 1  b 2 X i  b 3 D 2 i  b 4 D 3 i  b 5 A 2 i y i  ( b 1  b 3 )  b 2 X i 2 bedroom , 1 bathroom
Sales Value of Residential Property For a 1-bedroom, 2-bathroom home, we have  D2=0, D3=0, and A2=1 ^ ^ y i  b 1  b 2 X i  b 3 D 2 i  b 4 D 3 i  b 5 A 2 i y i  ( b 1  b 5 )  b 2 X i 1 bedroom , 2 bathroom
Sales Value of Residential Property For a 2-bedroom, 2-bathroom home,  we have  D2=1, D3=0, and A2=1 y i  b 1  b 2 X i  b 3 D 2 i  b 4 D 3 i  b 5 A 2 i ^ y i  ( b 1  b 3  b 5 )  b 2 X i 2 bedroom , 2 bathroom ^ y i  ( b 1  b 4  b 5 )  b 2 X i 3 bedroom , 2 bathroom ^ y i  ( b 1  b 4 )  b 2 X i 3 bedroom , 1 bathroom ^
square feet of living space 0 X  i House Sales Model with  Restricted  Intercepts b   b   b  D2-A2  (two bed, two bath) b   b  D2-A1  (two bed, one bath) b   b  D1-A2  (one bed, two bath) b  D1-A1  (one bed,one bath) y i selling price b   b   b  D3-A2  (three bed, two bath) b   b  D3-A1  (three bed, one bath) b  y i  b 1  b 2 X i  b 3 D 2 i  b 4 D 3 i  b 5 A 2 i ^ ^ Rigid !!!
Creating  Composite   Dummy Variables  ( vs.  characteristic  dummy variables )
Bath- rooms How do we create  composite   dummy variables ?  Need  to  account  for  the  interaction effect  between bathrooms  and bedrooms. ,[object Object],[object Object],[object Object],[object Object],Bedrooms
Composite   dummy variables   are created for each nonempty cell.  Create six  composite  dummy variables:   D1A1=1  if one bed and one bath,  or  D1A1= 0   D1A2=1  if one bed and two bath,  or  D1A2= 0   D2A1=1  if two bed and one bath,  or  D2A1= 0     D2A2=1  if two bed and two bath,  or  D2A2= 0     D3A1=1  if three bed and one bath, or  D3A1= 0     D3A2=1  if three bed and two bath, or  D3A2= 0
Sales Value of Residential Property y = sales value of the property (dollars) X = square feet of living space D1 A1  = interaction  one-bed  &  one-bath D1 A2  = interaction  one-bed  &  two-bath D2 A1  = interaction  two-bed  &  one-bath D2 A2  = interaction  two-bed  &  two-bath D3 A1  = interaction  three-bed  &  one-bath D3 A2  = interaction  three-bed  &  two-bath y i  b 1  b 2 X i  b 3 D1A2 i  b 4 D2A1 i  b 5 D2A2 i ^  b 6 D3A1 i  b 7 D3A2 i
This one equation with all these dummy variables actually is representing  six equations .  You must  substitute in for each of the dummy variables  to generate the  six equations  that are implied by this  one dummy variable equation. For a one-bedroom, one-bathroom home, Since  D1A1 = 1,  while the others are zero: y i  b 1  b 2 X i 1 bedroom , 1 bathroom ^ y i  b 1  b 2 X i  b 3 D1A2 i  b 4 D2A1 i  b 5 D2A2 i ^  b 6 D3A1 i  b 7 D3A2 i
square feet of living space 0 X  i House Sales Model with Unrestricted Intercepts D2-A2  (two bed, two bath) D2-A1  (two bed, one bath) D1-A2  (one bed, two bath) b  D1-A1  (one bed,one bath) y i selling price D3-A2  (three bed, two bath) D3-A1  (three bed, one bath) b 
one-bedroom ,  two-bathroom D1A2 =1, while the others are zero: now  graph  it  !  =======> y i  ( 1  b 3 )  b 2 X i 1 bedroom , 2 bathroom ^ y i  b 1  b 2 X i  b 3 D1A2 i  b 4 D2A1 i  b 5 D2A2 i ^  b 6 D3A1 i  b 7 D3A2 i b
square feet of living space 0 X  i House Sales Model with Unrestricted Intercepts D2-A2  (two bed, two bath) b   b  D2-A1  (two bed, one bath) D1-A2  (one bed, two bath) b  D1-A1  (one bed,one bath) y i selling price D3-A2  (three bed, two bath) D3-A1  (three bed, one bath)
two-bedroom ,  one-bathroom now  graph  it  !  =======> y i  ( b 1  b 4 )  b 2 X i 2 bedroom , 1 bathroom ^ y i  b 1  b 2 X i  b 3 D1A2 i  b 4 D2A1 i  b 5 D2A2 i ^  b 6 D3A1 i  b 7 D3A2 i D2A1 =1, while the others are zero:
square feet of living space 0 X  i House Sales Model with Unrestricted Intercepts D2-A2  (two bed, two bath) b   b  D2-A1  (two bed, one bath) b   b  D1-A2  (one bed, two bath) b  D1-A1  (one bed,one bath) y i selling price D3-A2  (three bed, two bath) D3-A1  (three bed, one bath)
two-bedroom ,  two-bathroom now  graph  it  !  =======> y i  ( b 1  b 5 )  b 2 X i 2 bedroom , 2 bathroom ^ y i  b 1  b 2 X i  b 3 D1A2 i  b 4 D2A1 i  b 5 D2A2 i ^  b 6 D3A1 i  b 7 D3A2 i D2A2 =1, while the others are zero:
square feet of living space 0 X  i House Sales Model with Unrestricted Intercepts b   b  D2-A2  (two bed, two bath) b   b  D2-A1  (two bed, one bath) b   b  D1-A2  (one bed, two bath) b 1 D1-A1  (one bed,one bath) y i selling price D3-A2  (three bed, two bath) D3-A1  (three bed, one bath)
square feet of living space 0 X  i House Sales Model with Unrestricted Intercepts b   b   D2-A2  (two bed, two bath) b   b  D2-A1  (two bed, one bath) b   b  D1-A2  (one bed, two bath) b 1 D1-A1  (one bed,one bath) y i selling price b   b  D3-A2  (three bed, two bath) b   b  D3-A1  (three bed, one bath)
Creating  Composite   Dummy Variables  ( vs.  characteristic  dummy variables )
Bath- rooms How do we create  composite   dummy variables ?  Need  to  account  for  the  interaction effect  between bathrooms  and bedrooms. ,[object Object],[object Object],[object Object],[object Object],Bedrooms
Bedrooms  vs.  Baths  vs.  Garage Baths Bedrooms Cars in Garage: 1 2 3 1 2 1 2 1 2 1 2 4 3 5 10 16 40 2 1 6 0 7 12 14 40 3 10 3 12 22 30 80
Karnaugh Map for  Bedrooms ,  Baths ,  Garage , and  School : Adams Saint Joseph 1 2 3 1 2 1 2 1 2 A 1 1 3 2 5 6 13 30 2 0 3 0 6 7 8 24 J 1 1 1 1 0 4 3 10 2 1 3 0 1 5 6 16 3 10 3 12 22 30 80

Contenu connexe

Tendances

Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxPatilDevendra5
 
Theory of estimation
Theory of estimationTheory of estimation
Theory of estimationTech_MX
 
Testing of hypothesis - large sample test
Testing of hypothesis - large sample testTesting of hypothesis - large sample test
Testing of hypothesis - large sample testParag Shah
 
Basic concepts of_econometrics
Basic concepts of_econometricsBasic concepts of_econometrics
Basic concepts of_econometricsSwapnaJahan
 
Time Series, Moving Average
Time Series, Moving AverageTime Series, Moving Average
Time Series, Moving AverageSOMASUNDARAM T
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsTransweb Global Inc
 
Functional forms in regression
Functional forms in regressionFunctional forms in regression
Functional forms in regressionB SWAMINATHAN
 
regression assumption by Ammara Aftab
regression assumption by Ammara Aftabregression assumption by Ammara Aftab
regression assumption by Ammara AftabUniversity of Karachi
 
What is a phi coefficient?
What is a phi coefficient?What is a phi coefficient?
What is a phi coefficient?Ken Plummer
 
6.2 General equilibrium under pure exchange, with production and output mix.pdf
6.2 General equilibrium under pure exchange, with production and output mix.pdf6.2 General equilibrium under pure exchange, with production and output mix.pdf
6.2 General equilibrium under pure exchange, with production and output mix.pdfVincentTaziMugwira
 

Tendances (20)

Autocorrelation
AutocorrelationAutocorrelation
Autocorrelation
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
Dummy variables xd
Dummy variables xdDummy variables xd
Dummy variables xd
 
Chi square
Chi squareChi square
Chi square
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
Unit Root Test
Unit Root Test Unit Root Test
Unit Root Test
 
Theory of estimation
Theory of estimationTheory of estimation
Theory of estimation
 
Testing of hypothesis - large sample test
Testing of hypothesis - large sample testTesting of hypothesis - large sample test
Testing of hypothesis - large sample test
 
Basic concepts of_econometrics
Basic concepts of_econometricsBasic concepts of_econometrics
Basic concepts of_econometrics
 
Time Series, Moving Average
Time Series, Moving AverageTime Series, Moving Average
Time Series, Moving Average
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | Eonomics
 
Panel data
Panel dataPanel data
Panel data
 
Functional forms in regression
Functional forms in regressionFunctional forms in regression
Functional forms in regression
 
regression assumption by Ammara Aftab
regression assumption by Ammara Aftabregression assumption by Ammara Aftab
regression assumption by Ammara Aftab
 
What is a phi coefficient?
What is a phi coefficient?What is a phi coefficient?
What is a phi coefficient?
 
6.2 General equilibrium under pure exchange, with production and output mix.pdf
6.2 General equilibrium under pure exchange, with production and output mix.pdf6.2 General equilibrium under pure exchange, with production and output mix.pdf
6.2 General equilibrium under pure exchange, with production and output mix.pdf
 

Similaire à Dummy Variable Regression

Lecture7-DummyVariables.ppt
Lecture7-DummyVariables.pptLecture7-DummyVariables.ppt
Lecture7-DummyVariables.pptBinhDaoThiThanh1
 
Digital Logic.pptxghuuhhhhhhuu7ffghhhhhg
Digital Logic.pptxghuuhhhhhhuu7ffghhhhhgDigital Logic.pptxghuuhhhhhhuu7ffghhhhhg
Digital Logic.pptxghuuhhhhhhuu7ffghhhhhgAnujyotiDe
 
math online kumustahan.pptx
math online kumustahan.pptxmath online kumustahan.pptx
math online kumustahan.pptxJohnleoClaus2
 
Lesson 14 a - parametric equations
Lesson 14 a - parametric equationsLesson 14 a - parametric equations
Lesson 14 a - parametric equationsJean Leano
 
binomial_theorem_notes.ppt
binomial_theorem_notes.pptbinomial_theorem_notes.ppt
binomial_theorem_notes.pptLokeshBabuDV
 
ISI MSQE Entrance Question Paper (2009)
ISI MSQE Entrance Question Paper (2009)ISI MSQE Entrance Question Paper (2009)
ISI MSQE Entrance Question Paper (2009)CrackDSE
 
An Introduction to Linear Programming
An Introduction to Linear ProgrammingAn Introduction to Linear Programming
An Introduction to Linear ProgrammingMinh-Tri Pham
 
Complex roots of the characteristic equation
Complex roots of the characteristic equationComplex roots of the characteristic equation
Complex roots of the characteristic equationTarun Gehlot
 
Core 4 Parametric Equations 2
Core 4 Parametric Equations 2Core 4 Parametric Equations 2
Core 4 Parametric Equations 2davidmiles100
 
linear transformation and rank nullity theorem
linear transformation and rank nullity theorem linear transformation and rank nullity theorem
linear transformation and rank nullity theorem Manthan Chavda
 
Introduction to Bayesian Inference
Introduction to Bayesian InferenceIntroduction to Bayesian Inference
Introduction to Bayesian InferencePeter Chapman
 
Algebra digital textbook gopika
Algebra digital textbook gopikaAlgebra digital textbook gopika
Algebra digital textbook gopikagopikarchandran
 
Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838ohenebabismark508
 

Similaire à Dummy Variable Regression (20)

Lecture7-DummyVariables.ppt
Lecture7-DummyVariables.pptLecture7-DummyVariables.ppt
Lecture7-DummyVariables.ppt
 
Digital Logic.pptxghuuhhhhhhuu7ffghhhhhg
Digital Logic.pptxghuuhhhhhhuu7ffghhhhhgDigital Logic.pptxghuuhhhhhhuu7ffghhhhhg
Digital Logic.pptxghuuhhhhhhuu7ffghhhhhg
 
math online kumustahan.pptx
math online kumustahan.pptxmath online kumustahan.pptx
math online kumustahan.pptx
 
Lesson 14 a - parametric equations
Lesson 14 a - parametric equationsLesson 14 a - parametric equations
Lesson 14 a - parametric equations
 
binomial_theorem_notes.ppt
binomial_theorem_notes.pptbinomial_theorem_notes.ppt
binomial_theorem_notes.ppt
 
ISI MSQE Entrance Question Paper (2009)
ISI MSQE Entrance Question Paper (2009)ISI MSQE Entrance Question Paper (2009)
ISI MSQE Entrance Question Paper (2009)
 
Boolean alebra
Boolean alebraBoolean alebra
Boolean alebra
 
An Introduction to Linear Programming
An Introduction to Linear ProgrammingAn Introduction to Linear Programming
An Introduction to Linear Programming
 
Complex roots of the characteristic equation
Complex roots of the characteristic equationComplex roots of the characteristic equation
Complex roots of the characteristic equation
 
Core 4 Parametric Equations 2
Core 4 Parametric Equations 2Core 4 Parametric Equations 2
Core 4 Parametric Equations 2
 
linear transformation and rank nullity theorem
linear transformation and rank nullity theorem linear transformation and rank nullity theorem
linear transformation and rank nullity theorem
 
Econometrics Notes
Econometrics NotesEconometrics Notes
Econometrics Notes
 
Questions on ratio and proportion
Questions on ratio and proportion Questions on ratio and proportion
Questions on ratio and proportion
 
Questions on ratio and proportion
Questions on ratio and proportion Questions on ratio and proportion
Questions on ratio and proportion
 
Introduction to Bayesian Inference
Introduction to Bayesian InferenceIntroduction to Bayesian Inference
Introduction to Bayesian Inference
 
THE BINOMIAL THEOREM
THE BINOMIAL THEOREM THE BINOMIAL THEOREM
THE BINOMIAL THEOREM
 
Ht upload
Ht uploadHt upload
Ht upload
 
Binomial
BinomialBinomial
Binomial
 
Algebra digital textbook gopika
Algebra digital textbook gopikaAlgebra digital textbook gopika
Algebra digital textbook gopika
 
Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838
 

Dernier

Group Discussion and panel Discussion
Group Discussion  and   panel DiscussionGroup Discussion  and   panel Discussion
Group Discussion and panel DiscussionAbdulGhaffarGhori
 
Masjid Ishaq The Mosque of Babo Dehri Swabi
Masjid Ishaq The Mosque of Babo Dehri SwabiMasjid Ishaq The Mosque of Babo Dehri Swabi
Masjid Ishaq The Mosque of Babo Dehri SwabiAlhamdulillah 33
 
Living in the Light_ A guide to personal transformation ( PDFDrive ).pdf
Living in the Light_ A guide to personal transformation ( PDFDrive ).pdfLiving in the Light_ A guide to personal transformation ( PDFDrive ).pdf
Living in the Light_ A guide to personal transformation ( PDFDrive ).pdfkalpana413121
 
12 Week Weight Loss Planner to help with planning weight loss
12 Week Weight Loss Planner to help with planning weight loss12 Week Weight Loss Planner to help with planning weight loss
12 Week Weight Loss Planner to help with planning weight lossSimpleMoneyMaker
 
FUNDAMENTALS OF ARNIS ARNIS ARNIS ARNIS ARNIS
FUNDAMENTALS OF ARNIS ARNIS ARNIS ARNIS ARNISFUNDAMENTALS OF ARNIS ARNIS ARNIS ARNIS ARNIS
FUNDAMENTALS OF ARNIS ARNIS ARNIS ARNIS ARNISe98298697
 
Uttoxeter & Cheadle Voice, Issue 122.pdf
Uttoxeter & Cheadle Voice, Issue 122.pdfUttoxeter & Cheadle Voice, Issue 122.pdf
Uttoxeter & Cheadle Voice, Issue 122.pdfNoel Sergeant
 

Dernier (6)

Group Discussion and panel Discussion
Group Discussion  and   panel DiscussionGroup Discussion  and   panel Discussion
Group Discussion and panel Discussion
 
Masjid Ishaq The Mosque of Babo Dehri Swabi
Masjid Ishaq The Mosque of Babo Dehri SwabiMasjid Ishaq The Mosque of Babo Dehri Swabi
Masjid Ishaq The Mosque of Babo Dehri Swabi
 
Living in the Light_ A guide to personal transformation ( PDFDrive ).pdf
Living in the Light_ A guide to personal transformation ( PDFDrive ).pdfLiving in the Light_ A guide to personal transformation ( PDFDrive ).pdf
Living in the Light_ A guide to personal transformation ( PDFDrive ).pdf
 
12 Week Weight Loss Planner to help with planning weight loss
12 Week Weight Loss Planner to help with planning weight loss12 Week Weight Loss Planner to help with planning weight loss
12 Week Weight Loss Planner to help with planning weight loss
 
FUNDAMENTALS OF ARNIS ARNIS ARNIS ARNIS ARNIS
FUNDAMENTALS OF ARNIS ARNIS ARNIS ARNIS ARNISFUNDAMENTALS OF ARNIS ARNIS ARNIS ARNIS ARNIS
FUNDAMENTALS OF ARNIS ARNIS ARNIS ARNIS ARNIS
 
Uttoxeter & Cheadle Voice, Issue 122.pdf
Uttoxeter & Cheadle Voice, Issue 122.pdfUttoxeter & Cheadle Voice, Issue 122.pdf
Uttoxeter & Cheadle Voice, Issue 122.pdf
 

Dummy Variable Regression

  • 2. “ Using Dummy Variables in Wage Discrimination Cases” Multiple Regression Sandy: pages 603 - 613 Also read paper titled:
  • 3. Are Male Nurses Discriminated Against? male nurses 0 female nurses Years of experience, X i W f _  4 ^ W m _  3 ^ ~ m W  3 ~ W f ~  4 ~   ~ adjusted for experience not adjusted for experience o o o o o o o o o o o o + + + + + + + + + + + + + + + + + + + + + + + + + o o o   ~
  • 4. I. Dummy Variables - Adjusting the intercept . Adjusting the slope . Adjusting both intercept and slope .
  • 5. Intercept Dummy Variables Dummy variables are binary (0,1) D t = 1 if red car, D t = 0 otherwise. y t =  1 +  2 X t +  3 D t + e t y t = speed of car in miles per hour X t = age of car in years Police: red cars travel faster . H 0 :  3 = 0 H 1 :  3 > 0
  • 6. y t =  1 +  2 X t +  3 D t + e t red cars : y t = (  1 +  3 ) +  2 X t + e t other cars : y t =  1 +  2 X t + e t y t X t miles per hour age in years 0  1 +  3  1  2  2 red cars other cars
  • 7. Slope Dummy Variables y t =  1 +  2 X t +  3 D t X t + e t y t =  1 + (  2 +  3 )X t + e t y t =  1 +  2 X t + e t y t X t value of porfolio years 0  2 +  3  1  2 stocks bonds Stock portfolio: D t = 1 Bond portfolio: D t = 0  1 = initial investment
  • 8. Different Intercepts & Slopes y t =  1 +  2 X t +  3 D t +  4 D t X t + e t y t = (  1 +  3 ) + (  2 +  4 )X t + e t y t =  1 +  2 X t + e t y t X t harvest weight of corn rainfall  2 +  4  1  2 “ miracle” regular “ miracle” seed: D t = 1 regular seed: D t = 0  1 +  3
  • 9. y t =  1 +  2 X t +  3 D t + e t  2  1 +  3  2  1 y t X t Men Women 0 y t =  1 +  2 X t + e t For men  D t = 1. For women  D t = 0. years of experience y t = (  1 +  3 ) +  2 X t + e t wage rate . . Testing for discrimination in starting wage H 0 :  3 = 0 H 1 :  3 > 0
  • 10. y t =  1 +  5 X t +  6 D t X t + e t  5  5 +  6  1 y t X t Men Women 0 y t =  1 + (  5 +  6 )X t + e t y t =  1 +  5 X t + e t For men D t = 1. For women D t = 0. Men and women have the same starting wage,  1 , but their wage rates increase at different rates (diff.=  6 ).  6 >   means that men’s wage rates are increasing faster than women's wage rates. years of experience wage rate
  • 11. y t =  1 +  2 X t +  3 D t +  4 D t X t + e t  1 +  3  1  2  2 +  4 y t X t Men Women 0 y t = (  1 +  3 ) + (  2 +  4 ) X t + e t y t =  1 +  2 X t + e t Women are given a higher starting wage,  1 , while men get the lower starting wage,  1 +  3 , (  3 < 0 ). But, men get a faster rate of increase in their wages,  2 +  4 , which is higher than the rate of increase for women,  2 , (since  4 > 0 ). years of experience An Ineffective Affirmative Action Plan women are started at a higher wage. Note : (  3 < 0 ) wage rate
  • 12.
  • 13. H 0 :    vs  1 :    H 0 :    vs  1 :    Y t   1   2 X t   3 D t   4 D t X t b   3 Est . Var b  3 ˜ t n  4 b    4 Est . Var b  4 ˜ t n  4 men: D t = 1 ; women: D t = 0 Testing for discrimination in starting wage. Testing for discrimination in wage increases. intercept slope  e t
  • 14. Why NOW wants one-sided test and Chauvinist Industries wants two-sided.
  • 15. Are Two Regressions Equal? y t =  1 +  2 X t +  3 D t +  4 D t X t + e t variations of “The Chow Test” I. Assuming equal variances (pooling): men: D t = 1 ; women: D t = 0 H o :  3 =  4 = 0 vs. H 1 : otherwise y t = wage rate This model assumes equal wage rate variance. X t = years of experience
  • 16. Testing  H o :        H 1 : otherwise and SSE R   y t  b 1  b 2 X t  2 t  1 T  SSE U   y t  b 1  b  X t  b  D t  b  D t X t  2 t  1 T   SSE R  SSE U   2 SSE U   T  4   F T  4  intercept and slope
  • 17. y t =  1 +  2 X t + e t II. Allowing for unequal variances: y tm =  1 +  2 X tm + e tm y tw =  1 +  2 X tw + e tw Everyone: Men only: Women only: SSE R Forcing men and women to have same  1 ,  2 . Allowing men and women to be different. SSE m SSE w where SSE U = SSE m + SSE w F = (SSE R  SSE U )/J SSE U /(T  K) J = # restrictions K=unrestricted coefs. (running three regressions) J = 2 K = 4
  • 18. Polynomial Terms y t =  1 +  2 X t +  3 X 2 t +  4 X 3 t + e t Linear in parameters but nonlinear in variables: y t = income; X t = age Polynomial Regression y t X t People retire at different ages or not at all. 90 20 30 40 50 60 80 70
  • 19. y t =  1 +  2 X t +  3 X 2 t +  4 X 3 t + e t y t = income; X t = age Polynomial Regression Rate income is changing as we age : Slope changes as X t changes.  y t  X t =  2 + 2  3 X t + 3  4 X 2 t
  • 20. Continuous Interaction y t =  1 +  2 Z t +  3 B t +  4 Z t B t + e t Exam grade = f(sleep: Z t , study time: B t ) Sleep and study time do not act independently. More study time will be more effective when combined with more sleep and less effective when combined with less sleep .
  • 21. Your mind sorts things out while you sleep (when you have things to sort out.) y t =  1 +  2 Z t +  3 B t +  4 Z t B t + e t Exam grade = f(sleep: Z t , study time: B t ) Your studying is more effective with more sleep . continuous interaction  y t  B t =  2 +  4 Z t  y t  Z t =  2 +  4 B t
  • 22. y t =  1 +  2 Z t +  3 B t +  4 Z t B t + e t Exam grade = f(sleep: Z t , study time: B t ) If Z t + B t = 24 hours, then B t = (24  Z t ) y t =  1 +  2 Z t +  3 (24  Z t ) +  4 Z t (24  Z t ) + e t y t = (  1 + 24  3 ) + (  2   3 + 24  4 ) Z t   4 Z 2 t + e t y t =  1 +  2 Z t +  3 Z 2 t + e t Sleep needed to maximize your exam grade : where  2 > 0 and  3 < 0  y t  Z t =  2 + 2  3 Z t = 0  2  3 Z t =
  • 23. Multicollinearity Correlation among the “ independent” variables. Note: They are independent of the error term, and not of one another.
  • 24. Let yi represent the ith person's wage rate and Xi represent their months of work experience in the equation: yi = b1 + b2 Xi + ei (1) b1 = intercept (starting wage) b2 = increase in the person's wage for each additional month of work experience. ei = error term with mean zero and estimated variance s2.
  • 25. yi = b1 + b2 Xi + b3 Mi + b4 Fi + ei (2) Fi = 1 if female Fi = 0 if male . Mi = 1 if male Mi = 0 if female .
  • 26. yi = b1 + b2 Xi + b3 Mi + b4 Fi + ei (2) Unfortunately this equation contains an underidentified set of parameters (b1, b3, and b4) and cannot be estimated without some restriction on the coefficients.
  • 27. To see this point, separate out the men's equation implied by equation (2) from the women's equation. For the men's equation Mi =1 and Fi =0. For men , equation (2) becomes: yi = (b1 + b3) + b2 Xi + ei (3) yi = b1 + b2 Xi + b3 Mi + b4 Fi + ei (2)
  • 28. For women , Mi =0 and Fi =1. For women , equation (2) becomes: yi = (b1 + b4) + b2 Xi + ei (4)
  • 29. Unfortunately, although we get estimates of the intercepts (b1 + b3) and (b1 + b4), the value of b1 cannot be separated from the values of b3 and b4. Some restriction is needed to achieve identification of b1, b3 and b4.
  • 30. One such restriction is b1 = 0. We can drop the original intercept term, b1, since men and women already have their own intercept terms, b3 and b4 , respectively.
  • 31. Underidentification of equation (2) can also be expressed in matrix terms. First, rewrite equation (2) putting the explanatory variables in a row vector multiplied by the corresponding column vector of their respective coefficients: y i    1  X i  M i  F i     2  3  4    i   5  1
  • 32. This only represents the ith observation where i = 1, ..., n. To represent the entire set of n observations at once, we need to &quot;pull the window shade down&quot; as follows: y 1 y 2 M y n  1 X 1 M 1 F 1 1 X 2 M 2 F 2 M M M M 1 X n M n F n  1  2  3  4   1  2 M  n (6)
  • 33. Equation (6) presents us with an X matrix whose first column (the column of ones) is an exact linear combination of the last two columns (the M and F columns). Since Mi is always zero when Fi is equal to one and Mi is always one when Fi is equal to zero, then it always holds that Mi + Fi = 1. Therefore, the first column is equal to the sum of the last two columns.
  • 34. Since Mi is always zero when Fi is equal to one and Mi is always one when Fi is equal to zero, then it always holds that Mi + Fi = 1. 1 1 M 1  M 1 M 2 M M n  F 1 F 2 M F n ( 9 )
  • 35. Equation (6) and, therefore,equation (2), represent a case of perfect multicollinearity . This means that a restriction must be introduced that drops one of these columns out of the regression. One such restriction is b1 = 0 , which means dropping the original intercept out of the regression model to provide the following reduced model: yi = b2 Xi + b3 Mi + b4 Fi + ei (10) Now men and women have separate intercepts and no common intercept is necessary.
  • 36. yi = b2 Xi + b3 Mi + b4 Fi + ei b2 b3 b2 b4 yi Xi Male Female 0 yi = b3 + b2 Xi + ei yi = b4 + b2 Xi + ei For males Mi = 1 and Fi = 0. For females Mi = 0 and Fi = 1. Males and females have different starting salaries , b3 > b4 , but their salaries increase at the same rate, b2.
  • 37. y i = b2 X i + b3 M i + b4 F i + e i b2 b3 b2 b4 y i X i Male Female 0 y i = b3 + b2 X i + e i y i = b4 + b2 X i + e i For males Mi = 1 and Fi = 0. For females Mi = 0 and Fi = 1. Males and females have different starting salaries , b3 > b4 , but their salaries increase at the same rate, b2. years of experience
  • 38. y i = b1 + b5 M i X i + b6 F i X i + e i b6 b5 b1 y i X i Male Female 0 y i = b1 + b5 X i + e i y i = b1 + b6 X i + e i For males Mi = 1 and Fi = 0. For females Mi = 0 and Fi = 1. Males and Females have the same starting salary b1, but their salaries increase at different rates ( b5 vs. b6 ). b5 > b6 means that men salaries are increasing faster than women's salaries. years of experience
  • 39. y i = b3 M i + b4 F i + b5 M i X i + b6 F i X i + e i b3 b4 For males Mi = 1 and Fi = 0. For females Mi = 0 and Fi = 1. b6 b5 y i X i Male Female 0 y i = b3 + b5 X i + e i y i = b4 + b6 X i + e i Females start with a higher starting salary, b4 , while men get the lower starting salary, b3 . But, men get a faster rate of increase in their salaries, b5 , which is higher than the rate of increase for females, b6 . ( b5 > b6 ). years of experience Chauvinist Industries Affirmative Action Plan
  • 40. y i = b2 X i + b3 M i + b4 F i + e i b2 b3 b2 b4 y i X i Male Female 0 y i = b3 + b2 X i + e i y i = b4 + b2 X i + e i For males Mi = 1 and Fi = 0. For females Mi = 0 and Fi = 1. Males and females have different starting salaries , b3 > b4 , but their salaries increase at the same rate, b2. Back to our basic model: years of experience
  • 41. Since under our null hypothesis the raw score test statistic: has a mean and a variance , we can standardize by subtracting the mean (zero) and dividing by the standard deviation (square root of the variance) to get the standardized test statistic:   b 3 – b 4 Var ( b 3 – b 4 ) b 3 – b 4
  • 42. To test the null hypothesis: Z  ( b   b   )  0 Var ( b    b   ) ~  ( 0 , 1 )
  • 43. If the var iance of the y i ,  2 , is unknown , then Var ( b  3  b  4 ) is also unknown and must be estimated from the exp ression : Est . Var ( b  3  b  4 )  Est . Var ( b  3 )  Est . Var ( b  4 )  2 Est . Cov ( b  3 , b  4 )
  • 44. Use the sample variance as an estimator of the population variance :
  • 45. The values for the following expression are obtained in practice from the diagonal and off-diagonal elements of the estimated variance-covariance matrix : Est . Var ( b  3  b  4 )  Est . Var ( b  3 )  Est . Var ( b  4 )  2 Est . Cov ( b  3 , b  4 )
  • 46. y i = b1 + b2 X i + b3 M i b2 (b1 + b3) b2 b1 y i X i Male Female 0 y i = ( b1 + b3 ) + b2 X i y i = b1 + b2 X i Males and females have different starting salaries , b3 > 0 , but their salaries increase at the same rate, b2. years of experience Alternative : make women the default group ^ ^ ^
  • 47. y i = b1 + b2 X i + b3 M i + b4 D i y i = (b1 + b3 + b4) + b2 X i y i = (b1 + b4) + b2 X i y i = (b1 + b3) + b2 X i y i = b1 + b2 X i characteristic dummy variables: male college grad: female college grad: male not a grad: female not a grad: ^ ^ ^ ^ ^
  • 48. years of experience 0 X i M-D (male-degree) F-D (female-degree) M-N (male-no degree) F-N (female-no degree) y i wage rate very restrictive assumption y i = b1 + b2 X i + b3 M i + b4 D i b1 b1+b3 b1+b4 b1+b3+b4 very rigid !!! ^
  • 49. Creating Composite Dummy Variables ( vs. characteristic dummy variables )
  • 50. Job: Gender: Karnaugh map for gender vs. status of job : S I M 15 25 40 F 13 27 40 28 52 80 S = supervisor I = individual men : women :
  • 51. Occupation vs. Job vs. Gender Gender: Occupation: Job: C T U S I S I S I M 2 4 3 5 10 16 40 F 1 6 0 7 12 14 40 3 10 3 12 22 30 80 C = Computer T = Other Technical U = Untechnical
  • 52. Karnaugh Map for Occupation , Job Status, Gender , and Degree Status: Degree No Degree C T U S I S I S I D M 1 3 2 5 6 13 30 F 0 3 0 6 7 8 24 N M 1 1 1 0 4 3 10 F 1 3 0 1 5 6 16 3 10 3 12 22 30 80
  • 53. composite dummy variables: This defines combined ( instead of separate ) general characteristics. y i = b1 + b2 X i + b3 MN i + b4 FD i + b5 MD i years of experience 0 X i M-D (male-degree) F-D (female-degree) M-N (male-no degree) F-N (female-no degree) y i wage rate b1 b1 + b3 b1 + b4 b1 + b5 ^
  • 54. Multiple Regression Analysis value of residential property ( buying a home )
  • 55. A i = bathrooms X i = sq. ft. living space H 0 :    vs. H 1 :    H 0 :    vs. H 1 :    ˆ Y i  b  1  b  2 X i  b  3 A i  b  4 A i X i b  3 Est . Var b  3 ˜ t n  4 b  4 Est . Var b  4 ˜ t n  4
  • 56. Testing Ho:    H1 : otherwise and SSE R   y i  b 1  b 2 X i  2 i  1 n  SSE U   y i  b 1  b  X i  b  A i  b  A i X i  2 i  1 n 
  • 57. Sale of House with Bed and Bath Dummies 800 0 0 0 10.000 1000 0 0 1 20.000 1200 1 0 0 30.000 1500 1 0 0 40.000 1800 1 0 1 50.000 2000 1 0 1 60.000 2200 0 1 0 70.000 2500 0 1 0 80.000 3000 0 1 1 90.000 3500 0 1 1 100.000 PRICE = f ( SQFEET, D2BED, B3BED, A2BATH ) I. II. III. IV. PRICE (thousands) I. SQFEET = square feet of living space II. D2BED = dummy=1 if two-bedroom house III. D3BED = dummy=1 if three-bedroom house IV. A2BATH = dummy=1 if two-bathroom house
  • 58. PRICE = f ( SQFEET, D2BED, B3BED, A2BATH ) Sale of House with Bed and Bath Dummies ANALYSIS OF VARIANCE SOURCE SUM-OF-SQUARES DF MEAN-SQ F-RATIO P REGRESSION 8191.943 4 2047.986 176.378 0.000 RESIDUAL 58.057 5 11.611 DURBIN-WATSON D STATISTIC: 2.216 FIRST ORDER AUTOCORRELATION COEFF: - 0.153 DEP VAR: PRICE N: 10 MULTIPLE R: 0.996 SQUARED MULTIPLE R: 0.993 ADJUSTED SQUARED MULTIPLE R: 0.987 STD ERROR OF ESTIMATE: 3.40
  • 59. PRICE = f ( SQFEET, D2BED, B3BED, A2BATH ) Sale of House with Bed and Bath Dummies DEP VAR: PRICE N: 10 MULTIPLE R: 0.996 SQUARED MULTIPLE R: 0.993 ADJUSTED SQUARED MULTIPLE R: 0.987 STD ERROR OF ESTIMATE: 3.40 VARIABLE COEFF STD ERR T P(2-TAIL) INTERCEPT - 6.482 4.112 -1.576 0.176 SQFEET 0.021 0.005 3.958 0.011 D2BED 14.662 4.871 3.010 0.030 D3BED 29.803 10.575 2.818 0.037 A2BATH 4.883 3.953 1.235 0.272 ( for 1,000 square feet: 21 - 6.482 = 14.518 or $14,518 )
  • 60.
  • 61. Sales Value of Residential Property y = sales value of the property (dollars) X = square feet of living space D1 =dummy vble for one bedroom home D2 =dummy vble for two bedroom home D3 =dummy vble for three bedroom home A1 =dummy vble for one bathroom home A2 =dummy vble for two bathroom home For a one-bedroom, one-bathroom home, such that D2=0, D3=0, and A2=0, we have: y i  b 1  b 2 X i  b 3 D 2 i  b 4 D 3 i  b 5 A 2 i ^ y i  b 1  b 2 X i 1 bedroom , 1 bathroom ^
  • 62. Sales Value of Residential Property For a 2-bedroom, 1-bathroom home, we have D2=1, D3=0, and A2=0 ^ ^ y i  b 1  b 2 X i  b 3 D 2 i  b 4 D 3 i  b 5 A 2 i y i  ( b 1  b 3 )  b 2 X i 2 bedroom , 1 bathroom
  • 63. Sales Value of Residential Property For a 1-bedroom, 2-bathroom home, we have D2=0, D3=0, and A2=1 ^ ^ y i  b 1  b 2 X i  b 3 D 2 i  b 4 D 3 i  b 5 A 2 i y i  ( b 1  b 5 )  b 2 X i 1 bedroom , 2 bathroom
  • 64. Sales Value of Residential Property For a 2-bedroom, 2-bathroom home, we have D2=1, D3=0, and A2=1 y i  b 1  b 2 X i  b 3 D 2 i  b 4 D 3 i  b 5 A 2 i ^ y i  ( b 1  b 3  b 5 )  b 2 X i 2 bedroom , 2 bathroom ^ y i  ( b 1  b 4  b 5 )  b 2 X i 3 bedroom , 2 bathroom ^ y i  ( b 1  b 4 )  b 2 X i 3 bedroom , 1 bathroom ^
  • 65. square feet of living space 0 X i House Sales Model with Restricted Intercepts b   b   b  D2-A2 (two bed, two bath) b   b  D2-A1 (two bed, one bath) b   b  D1-A2 (one bed, two bath) b  D1-A1 (one bed,one bath) y i selling price b   b   b  D3-A2 (three bed, two bath) b   b  D3-A1 (three bed, one bath) b  y i  b 1  b 2 X i  b 3 D 2 i  b 4 D 3 i  b 5 A 2 i ^ ^ Rigid !!!
  • 66. Creating Composite Dummy Variables ( vs. characteristic dummy variables )
  • 67.
  • 68. Composite dummy variables are created for each nonempty cell. Create six composite dummy variables: D1A1=1 if one bed and one bath, or D1A1= 0 D1A2=1 if one bed and two bath, or D1A2= 0 D2A1=1 if two bed and one bath, or D2A1= 0 D2A2=1 if two bed and two bath, or D2A2= 0 D3A1=1 if three bed and one bath, or D3A1= 0 D3A2=1 if three bed and two bath, or D3A2= 0
  • 69. Sales Value of Residential Property y = sales value of the property (dollars) X = square feet of living space D1 A1 = interaction one-bed & one-bath D1 A2 = interaction one-bed & two-bath D2 A1 = interaction two-bed & one-bath D2 A2 = interaction two-bed & two-bath D3 A1 = interaction three-bed & one-bath D3 A2 = interaction three-bed & two-bath y i  b 1  b 2 X i  b 3 D1A2 i  b 4 D2A1 i  b 5 D2A2 i ^  b 6 D3A1 i  b 7 D3A2 i
  • 70. This one equation with all these dummy variables actually is representing six equations . You must substitute in for each of the dummy variables to generate the six equations that are implied by this one dummy variable equation. For a one-bedroom, one-bathroom home, Since D1A1 = 1, while the others are zero: y i  b 1  b 2 X i 1 bedroom , 1 bathroom ^ y i  b 1  b 2 X i  b 3 D1A2 i  b 4 D2A1 i  b 5 D2A2 i ^  b 6 D3A1 i  b 7 D3A2 i
  • 71. square feet of living space 0 X i House Sales Model with Unrestricted Intercepts D2-A2 (two bed, two bath) D2-A1 (two bed, one bath) D1-A2 (one bed, two bath) b  D1-A1 (one bed,one bath) y i selling price D3-A2 (three bed, two bath) D3-A1 (three bed, one bath) b 
  • 72. one-bedroom , two-bathroom D1A2 =1, while the others are zero: now graph it ! =======> y i  ( 1  b 3 )  b 2 X i 1 bedroom , 2 bathroom ^ y i  b 1  b 2 X i  b 3 D1A2 i  b 4 D2A1 i  b 5 D2A2 i ^  b 6 D3A1 i  b 7 D3A2 i b
  • 73. square feet of living space 0 X i House Sales Model with Unrestricted Intercepts D2-A2 (two bed, two bath) b   b  D2-A1 (two bed, one bath) D1-A2 (one bed, two bath) b  D1-A1 (one bed,one bath) y i selling price D3-A2 (three bed, two bath) D3-A1 (three bed, one bath)
  • 74. two-bedroom , one-bathroom now graph it ! =======> y i  ( b 1  b 4 )  b 2 X i 2 bedroom , 1 bathroom ^ y i  b 1  b 2 X i  b 3 D1A2 i  b 4 D2A1 i  b 5 D2A2 i ^  b 6 D3A1 i  b 7 D3A2 i D2A1 =1, while the others are zero:
  • 75. square feet of living space 0 X i House Sales Model with Unrestricted Intercepts D2-A2 (two bed, two bath) b   b  D2-A1 (two bed, one bath) b   b  D1-A2 (one bed, two bath) b  D1-A1 (one bed,one bath) y i selling price D3-A2 (three bed, two bath) D3-A1 (three bed, one bath)
  • 76. two-bedroom , two-bathroom now graph it ! =======> y i  ( b 1  b 5 )  b 2 X i 2 bedroom , 2 bathroom ^ y i  b 1  b 2 X i  b 3 D1A2 i  b 4 D2A1 i  b 5 D2A2 i ^  b 6 D3A1 i  b 7 D3A2 i D2A2 =1, while the others are zero:
  • 77. square feet of living space 0 X i House Sales Model with Unrestricted Intercepts b   b  D2-A2 (two bed, two bath) b   b  D2-A1 (two bed, one bath) b   b  D1-A2 (one bed, two bath) b 1 D1-A1 (one bed,one bath) y i selling price D3-A2 (three bed, two bath) D3-A1 (three bed, one bath)
  • 78. square feet of living space 0 X i House Sales Model with Unrestricted Intercepts b   b   D2-A2 (two bed, two bath) b   b  D2-A1 (two bed, one bath) b   b  D1-A2 (one bed, two bath) b 1 D1-A1 (one bed,one bath) y i selling price b   b  D3-A2 (three bed, two bath) b   b  D3-A1 (three bed, one bath)
  • 79. Creating Composite Dummy Variables ( vs. characteristic dummy variables )
  • 80.
  • 81. Bedrooms vs. Baths vs. Garage Baths Bedrooms Cars in Garage: 1 2 3 1 2 1 2 1 2 1 2 4 3 5 10 16 40 2 1 6 0 7 12 14 40 3 10 3 12 22 30 80
  • 82. Karnaugh Map for Bedrooms , Baths , Garage , and School : Adams Saint Joseph 1 2 3 1 2 1 2 1 2 A 1 1 3 2 5 6 13 30 2 0 3 0 6 7 8 24 J 1 1 1 1 0 4 3 10 2 1 3 0 1 5 6 16 3 10 3 12 22 30 80