Contenu connexe
Similaire à Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics/ (20)
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics/
- 1. © Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited.
Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work
may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited.
Confidential and proprietary.
Stepwise Logistic Regression
Lecture for FMI Students 27.05.2010
Alexander Efremov
- 2. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 2
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 3. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 3
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 4. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 4
Introduction
Applications of the Logistic Regression
Medicine – diagnostics, modeling of disease growth, treatment effect
Psychology – learn process modeling, psychological tests evaluation
Economics – risk analysis, countries debt investigation, occupational choices
Marketing – products consumption, retailers actions effect
Criminology – risk factors for performing of criminal act
Sociology – employment, graduation, vote analysis
Ecology – modeling population growth
linguistics – language changes
Chemistry – reaction models
Media – news effects, copycat reaction
Finance – credit scoring, fraud detection
Physics, Biology, etc.
The Logistic Model
- 5. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 5
Introduction
System Under Investigation
Individuals /rough data/ => System => Model
=>
=>
- 6. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 6
Introduction
System Identification Stages
- 7. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 7
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 8. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 8
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 9. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 9
Part I. Logistic Regression Model Development
Logistic Model
Linear relation Logistic relation
- 10. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 10
k
kyˆ
ky
N
– index of current individual – intercept
– number of observations – the i+1-th model parameter
– dependent variable – the i-th independent variable
/prob. of good/
– model output – i-th independent variable
/predicted prob. of good/
Part I. Logistic Regression Model Development
Logistic Model
Logistic Relation – General Form “Linear” Log. Regression Model
k
k
M
M
k
e
e
y
+
=
1
ˆ
kMk
e
y −
+
=
1
1
ˆ
knnkk xxM ,,110 ... θθθ +++=
)...( ,,110
1
1
ˆ
knnk xxk
e
y θθθ +++−
+
=
knnky
y
xx
k
k
,,110ˆ1
ˆ
...ln θθθ +++=−
0θ
iθ
kix ,
ni ,1=
Nk ,1=
- 11. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 11
Part I. Logistic Regression Model Development
Logistic Model
Notation
Parameters vector
Regression vector
Logistic model
1+
∈ n
Rθ
1+
∈ n
k Rϕ
T
n ]...[ 10 θθθθ =
T
knkk xx ]...1[ ,,1=ϕ
θϕθθθ T
kknnk
ee
y xxk
−+++−
+
=
+
=
1
1
1
1
ˆ )...( ,,110
- 12. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 12
Part I. Logistic Regression Model Development
Residual
The Residual
kkkk eye
e
y T
k
+=+
+
=
−
ˆ
1
1
θϕ
=−
=−
=−=
0,ˆ
1,ˆ1
ˆ
for
for
kk
kk
kkk
yy
yy
yye
Sources of Uncertainty
Unavailable significant factors
Simplified relations
Time-varying performance
Database errors
Fraud
- 13. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 13
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 14. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 14
Part I. Logistic Regression Model Development
Maximum Likelihood Estimator
Cost Function
Model output
Likelihood contribution
Likelihood function
Log-likelihood function
Maximum Likelihood Criterion
kk y
k
y
kk yyl −
−= 1
, )ˆ1(ˆθ
θ
θ
θ
θ
LL ln2minlnmax −⇔
∏
=
=
N
k
klL
1
,θθ
∑
=
−−+=
N
k
kkkk yyyyL
1
))ˆ1ln()1(ˆln(ln θ
)|1(ˆ kkk yPy ϕ==
- 15. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 15
Part I. Logistic Regression Model Development
Maximum Likelihood Estimator
Cost Function /-2 Log L/ for a Real Life Case
- 16. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 16
Tailor Series Expansion
Cost Function Models
Linear model
Quadratic model
Part I. Logistic Regression Model Development
Maximum Likelihood Estimator
)()()1( ˆˆ iii
θθθ ∆+=+
)()()(
ˆ
)(
)( iTiii
gfM θ
θ
∆+=
)()()(
2
1)()()(
ˆ
)(
)()( iiTiiTiii
HgfM θθθ
θ
∆∆+∆+=
3
)()()(
2
1)()()(
ˆ
)(
ˆ )()( OHgff iiTiiTiii
+∆∆+∆+=
∆+
θθθ
θθθ
)(
ˆ
)( iTi
fg
θ
∇=
)(
ˆ
2)( ii
fH
θ
∇=
Cost function
Gradient
Hessian
)(
ˆ
)(
ˆ ln ii
Lf
θθ
−=
?)(
=∆ i
θ
Estimates Update
- 17. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 17
Part I. Logistic Regression Model Development
Maximum Likelihood Estimator
Gradient Hessian
I-st Order Methods II-nd Order Method
/e.g. Steepest Descent/ /e.g. Newton-Raphson/
gαθ −=∆ gH 1−
−=∆ αθ
[ ] 1
10
+
∂
∂
∂
∂
∂
∂
∈= nTfff
Rg
nθθθ
L
11
2
2
1
2
0
2
1
2
2
1
2
01
2
0
2
10
2
2
0
2
+×+
∂
∂
∂∂
∂
∂∂
∂
∂∂
∂
∂
∂
∂∂
∂
∂∂
∂
∂∂
∂
∂
∂
∈
= nn
fff
fff
fff
RH
nnn
n
n
θθθθθ
θθθθθ
θθθθθ
L
MOMM
L
L
θ
(0)
1
2
θ*θopt
1
2
θ
(0)
θ*
θopt
- 18. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 18
Steepest Newton-
Descent Raphson
(NR)
NR with NR with
Line Search Quadratic
Interpolation
1
2
θ
(0)
θ*
θopt
θ
(0)
1
2
θ*θopt
Part I. Logistic Regression Model Development
Maximum Likelihood Estimator
gαθ −=∆
gH 1−
−=∆ αθ
gH 1* −
−=∆ αθ
gH 1* −
−=∆ αθ
θ
(0)
1
2
θ*θopt
θ
(0)
1
2
θ*θopt
- 19. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 19
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 20. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 20
Numerical Problems
Matrix inversion, hence SVD, EVD, QR, etc.
Local Minima
Part I. Logistic Regression Model Development
Potential problems
Model Overfitting
αθθ −=+ )()1( ˆˆ ii 1−
H g
-2lnL
k
y2,k
yk
1,ky
- 21. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 21
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 22. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 22
Part I. Logistic Regression Model Development
Frequently Used Statistics for Model Analysis
Individual Estimate Measures
Standard error
Wald statistic
p-value
Overall Model Measures
Coefficient of determination (R2)
generalized R2
gen. max. resc. R2
Cost function
2
1
ˆ)ˆ(
~2
ˆ
2
2
ˆ
2
χ
θθ
σ
θ
σ
θθ
i
i
i
ii
iW ==
−
N
LL
eR
θθ ˆln0
ˆln
2
12
−
−=
1
0
ˆln2
1 −−= N
L
esR
θ
Rs
R
mR
22
=
)(
ˆ
)(
ˆ ln2 ii
Lf
θθ
−=
iH
i
)][diag( 1
ˆ
−
=θ
σ
2
1Pr χ>
χ
p-value
WWi
- 23. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 23
Part I. Logistic Regression Model Development
Frequently Used Statistics for Model Analysis
Modified criteria
Akaike Information Criterion (AIC)
Schwarz Criterion (SC)
Minimum Description Length (MDL), Final Prediction Error (FPE), etc.
Model Validation
Data split into development and validation samples
nLAIC 2ln2 ˆˆ +−= θθ
)1ln(ln2 ˆˆ −+−= NnLSC θθ
AIC
-2lnL
- 24. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 24
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 25. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 25
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 26. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 26
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
xo, xe – sets of all variables, out/entered in the model
xoi, xei – the most/less significant variable
SLE – Significance Level to Enter
SLS – Significance Level to Stay
SWR
- 27. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 27
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
Available information
- 28. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 28
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
1
Initialization
- 29. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 29
Forward Selection
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
1
2
- 30. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 30
1
2 3
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
Forward Selection
- 31. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 31
2 3
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
Backward Elimination
- 32. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 32
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 33. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 33
Part II. Stepwise Logistic Regression
Step 0. Initialization
Logistic model
1. Intercept Model
2. Full model
3. One Factor Model
Check for Enter
Score Chi-Sq for all potential models
Maximum Score Chi-Square
p-value & threshold
Model Determination (Optimization)
θϕT
ke
yk
−
+
=
1
1
ˆ
ii
T
ii gHgS 1−
=
R∈θ 1=kϕ
1+
∈ n
Rθ T
knkk xx ]1[ ,,1 K=ϕ
i
i
Smaxarg1 =l
SLEvalue-p 1
<l
T
kk x ]1[ ,1l=ϕ2
R∈θ
- 34. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 34
Part II. Stepwise Logistic Regression
Step 1. Forward Selection
1. Check for Enter
Score Chi-Square of all potential models
Maximum Score Chi-Square
p-value & threshold
2. Model Determination (Optimization)
3. Statistics for Model Analysis
Individual Estimate Measures
standard error
Wald statistic & p-value
ii
T
ii gHgS 1−
=
i
i
i Smaxarg=l
SLEvalue-p <il
T
kkk i
xx ]1[ ,,1 ll K=ϕ1+
∈ i
Rθ
- 35. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 35
Part II. Stepwise Logistic Regression
Step 1. Forward Selection
3. Statistics for Model Analysis (part 2)
Overall Model Measures
Coefficients of determination
Cost function
Modified criteria
Akaike Information Criterion (AIC)
Schwarz Criterion (SC)
- 36. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 36
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression
SWR
- 37. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 37
Part II. Stepwise Logistic Regression
Step 2. Backward Elimination
1. Check for Leave
Wald statistic & p-value of all potential models
p-value & threshold
2. Model Determination (Optimization)
3. Statistics for Model Analysis
Individual Estimate Measures
standard error
Wald statistic & p-value
T
kkkkk ijj
xxxx ]1[ ,,,, 111 llll KK +−
=ϕi
R∈θ
SLLvalue-pmax >il
- 38. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 38
3. Statistics for Model Analysis (part 2)
Overall Model Measures
Coefficients of determination
Cost function
Modified criteria
Akaike Information Criterion (AIC)
Schwarz Criterion (SC)
Part II. Stepwise Logistic Regression
Step 2. Backward Elimination
- 39. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 39
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 40. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 40
Part II. Stepwise Logistic Regression
Potential problems in the Stepwise Regression
Local Minima & Initial Conditions
Numerical Problems /SVD, EVD, QR, etc./
Model Overfitting
- 41. © Experian Limited 2007. All rights reserved.
Confidential and proprietary. 41
Summary
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
- 42. © Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited.
Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work
may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited.
Confidential and proprietary.
Stepwise Logistic Regression
Lecture for FMI Students 27.05.2010
Alexander Efremov
Thank You!
http://anp.tu-sofia.bg/aefremov/index.htm