2. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 20098/24/2018
Introduction and data description
Overall Graphical View
Befuddlers: Analytics of suppression, redundancy and
enhancement
Befuddlers: Graphical presentation
Coefficient interpretation in multivariate setting
Befuddlers and co-linearity
3. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 20098/24/2018
Befuddling issues in linear regression context due to
misunderstand of ‘conditioning’. Will show that:
- Uncorrelated predictor to dependent variable may increase
significance and fit of other predictors.
-Correlated predictors may enhance model fit.
-Extreme corr (x, z) does not always co-linearity.
Coefficient effect interpretation can be faulty when distinction
between zero-order and partial correlation is disregarded.
5. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Correlations, conditional correlations and Redundancy.
Linear model Y = a0 + a X + b Z + , with usual assumptions; circles below are
unit-variance circles (a + b + d + e = 1).
Y
X
Z
b d
e
a
r2
yx = b + d r2
yz = d + e R2 = b + d + e
pr2
yx = b / (a + b) pr2
yz = e/ (a + e),
sr2
yx = b sr2
yz = e
r2: zero order corr2 SSR(X, Z) / SST.
pr2: partial corr2 r2
yx.z= SSR(X/Z) / [SST – SSR(Z)] .
sr2 : semi-partial corr2 r2
y(x.z) = SSR(X/Z) / SST.
SST = 1 = a + b + d + e.
SSR(X) = b + d.
SSR(X / Z) = b.
SSR(X, Z) = b + d + e.
SSR(Z) = d + e.
But, SSR(X) + SSR(Z) > SSR(X,Z)
not always true.
8/24/2018
6. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Note that in previous slide:
R2 = b + d + e
R2 ≤ r2
yx + r2
yz = b + 2d + e
r2
yx = b + d r2
yz = d + e
‘d’ appears once in R2 while the sum of the marginal correlations
implies 2d. From previous slide, ‘d’ cannot be obtained via partial
or semi partial corrs alone. Instead, via marginal correlations:
R2 = (r2
yx + r2
yz – 2 ryx ryz rxz) / (1 – r2
xz) , and
“d” = r2
yx - sr2
yx = r2
yz - sr2
yz
Y
X
Z
b d
e
a
8/24/2018
7. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
2
xz
n
i i
i 1
xy
n n
2 2
i i
i 1 i 1
xy xz yz
yx y(x.z) 2
xz
(1 - r : det. of corr. matrix):
(X X)(Y Y)
Zero order r
(X X) (Y Y)
Semi
Correlations
partial
( r -
of different
r r )
sr r
(1 - r
orders.
)
8/24/2018
8. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
yx yx.z
yx xz yz
2 2
xz yz
yx.z yw.z xw.z
yx yx.zw 2 2
yw.z xw.z
Partial : pr r
( r - r r )
(1 - r ) (1 - r )
partial 2nd :
( r -
Correlations
r r )
pr r
(1
of different o
- r ) (1
rders.
- r )
8/24/2018
9. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Important relationships:
8/24/2018
2
.
2
.
2 2
. .
2
.
( / ) ( , ) ( )
( ) ( )
( / , )
( , )
( , ) ( , )
( , )
1
yx z
yx zw
y xwz y wz
y wz
SSR x z SSR x z SSR z
pr
SST SSR z SST SSR z
SSR x z w
pr
SST SSR z w
SSR x z SSR z w
SST SSR z w
R R
R
10. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Regr R69 = R72 R78
Corr Partial Semi
Y R Square Indep. Var
-0.01836688 0.00034119 0.00034102R69 0.0008622394 R72
R78 -0.02283033 0.00052507 0.00052490
pr2
R69.R72 = r2
R69.R72(R78) = (R2 – r2
R69.R72) / (1 - r2
R69.R72), also equal to
partial corr calculated from all zero order correlations.
sr2
R69.R72 = r2
R69(R72.R78) = (R2 – r2
R69.R72), equal to semi-partial
calculated from all zero order correlations. It is proportion of Var
(R69) fitted by R72 over and above what R78 has already fitted.
Let’s partition R2 ……………… (Note: This is
IMPORTANT!!!).
8/24/2018
11. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
2 2 2 2
0.123... 01 0(2.1) 0(3.12)
2
0( .123... 1)
that is, addition of non-redundant
X information.
...
for p independent vars. Note:
sum of correlations,semi-partial
p
p p
R r r r
r
8/24/2018
12. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 20098/24/2018
Similarly for SSR:
Extra sum of Squares decomposition for SSR
(Type I)
1 2 1 2 1
3 2 1 1 2 1
2 3 1 1 1 2 3
2 2 1 2 1 2
1.2
2 2
( , ,,, ) ( ) ( / )
( / , ) ... ( / ,,, , )
( , / ) ( ) ( , , )
( ) ( , ) ( / )
( ) ( )
p
p p
Y
SSR X X X SSR X SSR X X
SSR X X X SSR X X X X
and SSR X X X SSE X SSE X X X
SSE X SSE X X SSR X X
R
SSE X SSE X
13. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 20098/24/2018
Correlation and angles.
,
.....( )
:
( , )
( ) ( )
( )( )
( ) ( )
std
n-1
1
2 2
1 1
remember |X |=
1
X standardized
1
corr (X,Y) =
std
X
std std
xy
n
i i
i
xy
n n
i i
i i
X X
X
s
X YCov X Y
nVar X Var Y
r
X X Y Y
r
X X Y Y
Need to find length of standardized Variable to finish corr (X, Y)
and use the concept of inner product.
14. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 20098/24/2018
,
, 2
,
2
,2
, 2
,
( )
( 1)
( 1)( )
( )
( )
( 1)
i j i
i j
i j i
i j i
i i j
i j i
X X
z
X X
n
n X X
length z z
X X
n
Length of a standardized Variable
15. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 20098/24/2018
( ) ( ) cos
(
os
)
c
, | || | cos
From (1) before,
and by inner product definit
1 1
1
ion:
1
1 1
std std std std
n n
n
X Y X Y
n n
1) Corr (X, Y) = Corr standardized (X, Y).
2) All standardized variables have same length √(n-1).
3) Corr is always inner product of corresponding
standardized variables divided by n – 1 average
weighted sum of standardized X with weights given by
standardized Y, and vice-versa.
16. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 20098/24/2018
Standardized and non-standardized regression coefficients
x
2
x 2 2 2
'
x
'
' xx
x x
y
'
x
Regular Coefficient b
(X and Z case, lower case: mean removed)
z xy zx zy
b
z x ( xz)
Standardized Coefficient b
b can be obtaineds
General case b b
s from straight regression.
X and Z case b
yx xz yz
2
xz
'
x yx
r (r * r ) Similarity with straight
(1 r ) regr coeff estimation.
Equality with
X case b r
corr coeff.
17. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 20098/24/2018
Example.
Verifying length stdized
variable Result
Statistic
32.28Length X ~ N (-3, 1)
Length Y ~ N ( 5, 4) 51.89
Length Z ~ B (.3, 100) 5.83
Length STD(X) 9.95
Length STD(Y) 9.95
Length STD(Z) 9.95
# Obs 100.00
Sqrt (# Obs - 1) 9.95
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
X 100 -3.07297 0.99339 -307.29669 -5.89441 -0.94043
Y 100 4.71923 2.16757 471.92321 -0.52986 9.43811
Z 100 0.34000 0.47610 34.00000 0 1.00000
STDX 100 0 1.00000 0 -2.84023 2.14673
STDY 100 0 1.00000 0 -2.42165 2.17703
STDZ 100 0 1.00000 0 -0.71414 1.38628
Correlations X Y Z STDX STDY STDZ
Statistic Computation
-3.073 4.719 0.340 0.000 0.000 0.000MEAN
STD 0.993 2.168 0.476 1.000 1.000 1.000
N 100.000 100.000 100.000 100.000 100.000 100.000
CORR X 1.000 -0.009 0.048 1.000 -0.009 0.048
Y 1.000 0.057 -0.009 1.000 0.057
Z 1.000 0.048 0.057 1.000
STDX 1.000 -0.009 0.048
STDY 1.000 0.057
STDZ 1.000
19. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Two befuddling issues.
1) Why or when is the sign of a standardized coefficient
opposite to sign of zero order correlation of
predictor with dependent variable?
(Suppression).
2) Why or when does addition of correlated predictor to
set of predictors cause R2 to be higher than sum of
individual zero order correlations?
(Enhancement).
8/24/2018
21. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 20098/24/2018
Classical Suppression Example (Horst 1941).
Study of pilot performance (Y) from measures of
mechanical test (X) & verbal abilities test (Z).
When verbal ability (Z) was added to mechanical (X)
ability in equation, effect of X increased.
Happened because Z fitted variability in X, i.e., test of mechanical
ability also required verbal skills to read test directions. But Z did
not affect Y.
In fact, we have simultaneous equation system (SES)
with two dependent variables, X and Y.
Y = f (X, Z), X = g (Z)
But, specification of SES is far more difficult.
22. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 20098/24/2018
Horst 1941
Z
Mechanical Ability
Verbal
Ability x
Y pilot
performance
Horst found corr (Y, X) > 0 (pilot performance related to
mechanical ability), corr (Z, X) > 0 (test performance for test
taking), and corr (Z, Y) = 0 (test taking did not assist in airplane
piloting).
23. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Correlations and Redundancy in linear models.
R2 decomposition. Let Bi be beta coefficients for equation with standardized
variables.
R2 = Bi
2 + 2 Bi Bj rij ( i j)
Formula does not decompose Var (Y) because some Bi Bj rij may be
negative. However, when all cross-terms are zero, R2 = Bi
2 and also
R2 = r2
yi ………………………………….(1)
(setting cross-correlations to zero): Case of Independence.
Y
X
Z
Independence No
Redundancy.
8/24/2018
24. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Correlations and redundancy in linear models.
Stepwises highly correlated variables with dependent variable, “hoping”
formula (1) to apply (but seldom happens).
Rather than independence, more commonly: Redundancy.
It occurs whenever (in absolute terms)
ryx > ryz rxz and ryz > ryx rxz rxz ≠ 0
sryx < ryx, and Bx < ryx.
RedundancyY
X
Z
d
8/24/2018
25. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Redundant info for Y =
a + bX + cZ
Corr
(Y,X)
Std
Beta
Corr
(Y,Z)
Corr
(X,Z)
Corr(Y,Z)
*Corr(X,Z)
Corr(Y,X)
*Corr(X,
Z)
Semi
(Y,X
/Z)
Y X Z Redundant
?
0.288 0.313 0.346 0.135 0.047 0.039 0.244
R
1
R
2
R3 N
R59 Y 0.288 -0.029 -0.050 -0.073 0.004 -0.021 0.286
R69 N 0.288 -0.070 -0.067 0.012 -0.001 0.004 0.289
R72 Y 0.288 0.024 0.025 0.006 0.000 0.002 0.288
R78 Y 0.288 -0.175 -0.209 -0.129 0.027 -0.037 0.264
Pairwise Redundancy: Surendra Data.
8/24/2018
sryx < ryx, and Bx < ryx.
26. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Suppression Effects.
Areas ‘a’, ‘b’ and ‘e’ can be understood as proportions
of “Y” variance.
Area ‘d’ does not have same interpretation, because
can take negative value relationship of suppression
or enhancement.
Y
X
Z
b d
e
a
8/24/2018
27. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
In this case, r2
yz ~ 0, and Z does not
directly affect Y, except in so far as
reducing unfitted variance of X.
bz=0, |bx.z|>|bx|, bx.zbx>0
a) bx.z > bx> 0 or
b) bx.z < bx < 0
R2 = 1 (two predictor case)
r2
xz = 1 - r2
xy , i.e., Z fits remaining X
variance.
It can be verified that:
R2 = pr2
yx = sr2
yx
Y
X
Classical
Suppression
Suppression Effects – Classical (some graphics).
Cohen and Cohen (1975).
Conger (1974) calls it “Traditional”. In later parlance, also case of
Enhancement and of Confounding (confounding used in logistic regression).
Y
Z
8/24/2018
28. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
In this case, ryz / ryx < rxz; Z
primarily suppresses unfitted
variance of X, and vice-versa.
Y
X
Z
Suppression Effects – Net (graphics).
Cohen and Cohen (1975)’s notation. Conger (1974) calls
it “negative”. Suppressor variable receives negative
coefficient, and other coefficient is larger than
correlation with dependent variable. Coefficient of
suppressor opposite to sign of zero order correlation
with dependent variable.
8/24/2018
29. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Suppression Effects – Cooperative Suppression.
(no graph, Cohen and Cohen (1975), Conger (1974) calls it “reciprocal”):
Positive correlation with dependent variable, but negative
correlation among pairs of independent variables. Thus,
when variable is partialled out from another, all measures of
fit are enhanced.
In later parlance, case of Enhancement and Confounding.
In this case, suppressor coefficient exceeds correlation with dependent
variable. In terms of correlations and regression coefficients:
8/24/2018
yx yz xz
x.z x x.z x
z.x z z.x z
r 0 r 0 r 0
|b | |b |, b b 0
|b | |b |, b b 0
30. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Pearson Correlation Coefficients, N = 10000
Prob > |r| under H0: Rho=0
Y X Z
Y 1.00000 0.24484
<.0001
0.12213
<.0001
X 0.24484
<.0001
1.00000 -0.93240
<.0001
Z 0.12213
<.0001
-0.9324
<.0001
1.00000
Cooperative
Suppression
Fit Measure 1 Fit Measure 2
Root
MSE
Depende
nt Mean
Coeff
Var
R-
Square
Adj R-
Sq
Value Value Value Value Value Value
Model
1.95 3.03 64.34 0.06 0.06 0.00X_ALONE
X_AND_Z 0.00 3.03 0.00 1.00 1.00 0.00
Cooperative
Suppression
Parameter
Estimate Pr > |t| VIF
Model Variable
3.79 0.00 .X_ALON
E
Intercept
X 0.12 0.00 .
X_AND_Z Intercept 0.00 1.00 0.00
X 1.33 0.00 7.66
Z 0.67 0.00 7.66
Simulated
Data
8/24/2018
Cooperative Example.
32. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Suppression Effects.
Detection:
Std. coeff (semi-partial correlation) > | ri | suppression.
If ri is zero or close to it classical suppression.
If sg (std coeff) = -sg (correlation) net suppression.
If std coeff > ri and sg(std coeff) = sg (ri ) cooperative
suppression.
8/24/2018
35. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Correlations and Redundancy in linear models.
Misconception (Hamilton, 1987).
Since R2 = r2
i (orthogonal case) R2 ri
2 (general case)? NO … (I)
Y = a + b X + c Z + , (with SSR (X, Z)) equivalent to:
1) Y = d + e X + 1 e1 = Y – est (Y), its SSR called SSR(X)
2) Z = f + g X + 2 e2 = Z – est(Z),
3) e1 = h e2 + 3 (no intercept model), SSR called SSR (Z/X) …… (II)
SSR(X, Z) = SSR(X) + SSR(Z / X) …….(1) (recall earlier slide)
R2 = SSR / SST, R2 > r2
i SSR(Z/X) > SSR(Z) ……. (III)
Deriving Working formulae in terms of simple correlations:
pr2
yz = r2
yz.x = SSR (Z / X) / [SST – SSR(X)], and with (1)
R2 = r2
x + r2
yz.x(1 - r2
x ) = zero order + semi-partial.
8/24/2018
36. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Correlations and Redundancy in linear models.
R2 rx
2 + rz.x
2 but R2 > rx
2 + rz
2 is possible.
Nec. and suff. condition for R2 > r2
i SSR(Z/X) > SSR(Z) is:
2
yx yzyx2
xz xz
yx 2
yz
yx yz
2 2 2
yx yx yz
2 2
yx yz
2 2xz Zx
yx yz
0
r 2r r
pr r (r ) 0
(1 r ) r r
2r r
r r
pr (1 r ).
r r "Enhancement"
Remember that: spr
Condition is:
8/24/2018
Currie and Korabinski (1984) call it ‘enhancement’. Hamilton (1987)
“synergism”.
37. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Correlations and Redundancy in linear models.
Since R2 > r2
i possible
1) X-Y scatter plots and correlation measures may be
inadequate for variable selection with correlated variables.
X-Y correlations can be near 0 while R2 could be extremely
high.
2) Variable Removal due to co-linearity suspicions may be
counterproductive.
3) Forward stepwise methods suffer most from co-linearity.
4) Note that Corr (Y, Z) ≈ 0 and Z may still be useful Effects
on Variable Selection? t-value of Z could be insignificant.
5) Enhancement counterintuitive: predictor contributes more
to regression in presence of other predictors than by itself.
8/24/2018
38. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Pearson Correlation Coefficients, N = 10000
Prob > |r| under H0: Rho=0
Y Z X
Y 1.00000 0.00214
0.8306
0.24484
<.0001
Z 0.00214
0.8306
1.00000 0.97008
<.0001
Simulated
Data, case
Of
Enhancement.
Fit Measure 1 Fit Measure 2
Root MSE
Dependent
Mean Coeff Var R-Square Adj R-Sq
Value Value Value Value Value Value
Model
1.95 3.03 64.34 0.06 0.06 0.00X_ALONE
X_AND_Z 0.00 3.03 0.00 1.00 1.00 0.00
Net Suppression
Estima
te SE
Pr >
|t| VIF
Model Variabl
e
3.79 0.04 0.00 .X_ALON
E
Interc.
X 0.12 0.00 0.00 .
X_AND_
Z
Interc. 0.00 0.00 1.00 0.00
X 1.33 0.00 0.00 7.66
Z 0.67 0.00 0.00 7.66
X on Z Interc. 1.87 0.04 0.00 0.00
Z -0.48 0.00 0.00 1.00
Y and Z hardly correlated,
X and Z highly correlated.
8/24/2018
40. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
x Y ,X z Y ,z
ˆ ˆ| | | r | and | | | r |, stdndrzd coeffs. (1)
8/24/2018
Unifying differing nomenclature and definitions.
Velicer (1978) changed focus from standardized coeffs
to R2 because in previous formulation, |corr| < 1 but
betas unconstrained.
He suggested:
called “enhancement” by Currie and Korabinski (1984).
Let’s call enhancement (1) and (2) together. Otherwise,
just suppression.
2 2 2
Y,X Y,ZR r r (2)
42. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Comparing Suppression and Enhancement
Effects.
Per Friedman & Wall (2005), standardized variables:
x Y , X
ˆ| | | r |
2 2 2
Y , X Y ,Zi y,i
Suppression :
ˆ| | | r | but R r r
2 2 2
Y , X Y ,Zi y,i
Re dundancy :
ˆ| | | r | and R r r
2 2 2
Y ,X Y ,Zi y,i
Enhancement :
ˆ| | | r | but R r r
8/24/2018
43. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Betas, suppression and enhancement Examples, Y = f (X,
Z). (Friedman-Wall 2005).
yx yz
2 2
yx yz yx yz xzY,X Y,Z X,Z 2
X 2 2
X,Z xz
xz xz yi2 2
yx yz
2 2
yx yz yx yz x
2r r
r r
r r 2r r rr r *r
ˆ , for std X, and R
1 r (1 r )
Enhancement
r or r 0, since r 0 by assumption.
Nec. and suff. conditions :
1) r r and 2) r r 1 r
z
yx yz xz
0 region of enhanc.
if r ,r >0 and r 0 enhancement.
8/24/2018
44. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Cooperative
Suppression+
Enhancement
Redundancy
Net
Suppression
Ne
Region Name 2
2 2
2 2
2 2
2 2
ˆ ˆ
( ,0)
ˆ ˆ(0, ) 0 0
2
( , ) 0
xz x z
yx yz yx yz
yz
x yx z yz yx yz
yx
yz yx yz
yx yx yz
yx yx yz
r std std R
I low r r r r
r
II r r r r
r
r r r
III r r r
r r r
IV
t
Suppresion +
Enhancement
2 2
2 2
2
( , ) 0yx yz
xz yx yx yz
yx yz
r r
r upper r r r
r r
8/24/2018
51. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Suppression and Enhancement Effects –
summary.
”Suppressor” variable: enhances predictive ability of
another variable by reducing irrelevant variance of
otherwise relevant variable. In case of standardized
coefficients, Z is suppressor variable for X if Bx > rYX.
(Note: not necessary that rYZ be strictly 0).
“Redundant” variables decrease weights of other
variables (Conger, 1974).
“Enhancer” variable: increases overall R2 beyond sum
of zero-order correlations.
8/24/2018
52. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
kY k
k k k
xy xz yz
yx y(x.z) 2
xz
k
x ˆyx x
yx ˆy(x .x )
Velicer Suppression 2 predictor case :
( r - r r )
sr r
(1 - r )
J predictor case (Smith,1992) :
ˆx prediction on remaining (J-1) predictors.
( r - r r
sr r
k k
k k
k k
k k
k k k k
k k
ˆx
2
ˆx x
2 2
yx yx
x y ˆyx
ˆ ˆx x x x2 2
yx ˆyx
)
(1 - r )
Velicer's criterion : r sr
2r r
r or r 0
r r
8/24/2018
54. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
2
1 2
2
/
Confusion on signs of coefficients
and interpretation for
( )
{
( )
} ( ) ( )yi
xy xy
xi
xy
Y X
sY Y
b r r
sX X
sg r sg b
. .
. 2
2
But in multivariate: ,
estimated equation (emphasizing "partial")
ˆ ,
1
( ) ( )
( ) ( ) and 1
YX Z YZ X
Y YX YZ XZ
YX Z
X XZ
YX
YX YZ XZ XZ
Y X Z
Y a b X c Z where
s r r r
b
s r
sg b sg r
abs r abs r r r
8/24/2018
55. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
If recall partial and semi-partial correlation formulae
. 2
.
1
*
( ) ( )
Y YX YZ XZ
YX Z
X XZ
YX Z
Y
yx
X
Y
X
s r r r
b
s r
b
s
sr
s
s
semi partial
s
sg sg semi partial
8/24/2018
Coefficient signs in multivariate setting cannot necessarily
connote expected effects derived from theoretical analysis.
57. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Some Definitions.
Setting: linear models, specifically regression.
Co-linearity: existence of (almost) perfect linear
relationships among predictors, such that estimated
coefficients are unstable in repeated samples. Notice
that pair-wise or any other correlation notion is NOT
part of definition; instead LINEAR DEPENDENCE or
INDEPENDENCE is at its core.
8/24/2018
58. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Full (Exact) Colinearity
Equivalent conditions:
( ) ( ' ) | ' | 0rank X rank X X p X X
One or more predictors can be exactly expressed in terms of the
others. Sampling variance of some β = ∞, non-unique coeffs.
2
1 ( ).iR for some ith predictor s
8/24/2018
59. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Linear Regression Near Co-linearity: more likely.
(X’X) “wobbly”, “almost singular”. Almost??. Detour:
2
2 2
2
2 2
i
1
( ) . ,
( 1) 1
var( ),
: regr X on other X's.
i
i i
i i
i
Var b
n s R
s X
R R
8/24/2018
60. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
Present Practice, derived from ‘small’ datasets experience.
2
1
1 iR
: Variance Inflation Factor of Xi. √VIF_Xi affects CI of βi
multiplicatively.
Rule of thumb: VIF > 10 strong possibility of colinearity.
(1 / VIF) also called Tolerance.
2 2
2 2 ' 2
2 2 2
i
2 -1
,
1 1
( ) . . ,
( 1) 1 1
var( ), : regr X on other X's.
If X standardized, corr = cov matrix
= ( ' )
i
i i i i i
i i i
i i i
Var b
n s R X X R
s X R R
R X X
8/24/2018
61. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
2
2 2
1
( ) . ,
( 1) 1i
i i
Var b
n s R For given ‘p’ model.
1) In data mining, p → ∞, and R2 does not decrease with p
2
2
lim
lim 1
p i
p
R
R
Naïve estimation with (almost) all variables for the sake of
prediction (data mining disregards interpretation and with
powerful hard- and soft-ware) at least colinearity.
Data Mining World.
8/24/2018
62. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
2
i 2
xz
2
x,z i
1ˆX, Z indep. standardized, Var( )
1
ˆ1 Var( ) ? Not necessarily.
8/24/2018
When corr (X, Z) is very large, for “given” sigma-sq, var of
beta coefficient grows to infinity. But sigma-sq does not
necessarily stay fixed.
63. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 2009
2
2 2
i xz2
xz
2 2
xz yx yz yx yz
2
xz i
1 R 1ˆVar( ) (1), and r 1 R 1.
n 3 1 r
r *r (1 r )(1 r ),
ˆR (r Var( ) 0.
...
extreme values: r
= extreme values)=1, and
8/24/2018
Different Formulation.
65. Leonardo Auslender M008 Ch. 3 – Copyright 2008Leonardo Auslender Copyright 20098/24/2018
Region II and III: se’s increase with increasing co-linearity, but decrease at
extremes. Fig 5 and 6 show that high correlation coexist with small se’s
under enhancement and even under Suppression.