SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Arthur Charpentier, SIDE Summer School, July 2019
# 5 Classification & Boosting
Arthur Charpentier (Universit´e du Qu´ebec `a Montr´eal)
Machine Learning & Econometrics
SIDE Summer School - July 2019
@freakonometrics freakonometrics freakonometrics.hypotheses.org 1
Arthur Charpentier, SIDE Summer School, July 2019
Starting Point: Classification Tree
1 library(rpart)
2 cart = rpart(PRONO˜.,data=
myocarde)
3 library(rpart.plot)
4 prp(cart ,type=2, extra =1)
A (binary) split is based on one specific variable âĂŞ say xj âĂŞ and a cutoff,
say s. Then, there are two options:
• either xi,j ≤ s, then observation i goes on the left, in IL
• or xi,j > s, then observation i goes on the right, in IR
Thus, I = IL ∪ IR.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 2
Arthur Charpentier, SIDE Summer School, July 2019
Classification : Classification Trees
Gini for node I is defined as
G(I) = −
y∈{0,1}
py(1 − py)
where py is the proportion of individuals in the leaf of type y,
G(I) = −
y∈{0,1}
ny,I
nI
1 −
ny,I
nI
1 gini = function(y,classe){
2 T. = table(y,classe)
3 nx = apply(T,2,sum)
4 n. = sum(T)
5 pxy = T/matrix(rep(nx ,each =2) ,nrow =2)
6 omega = matrix(rep(nx ,each =2) ,nrow =2)/n
7 g. = -sum(omega*pxy*(1-pxy))
8 return(g)}
@freakonometrics freakonometrics freakonometrics.hypotheses.org 3
Arthur Charpentier, SIDE Summer School, July 2019
Classification : Classification Trees
1 -2*mean(myocarde$PRONO)*(1-mean(myocarde$PRONO))
2 [1] -0.4832375
3 gini(y=myocarde$PRONO ,classe=myocarde$PRONO <Inf)
4 [1] -0.4832375
5 gini(y=myocarde$PRONO ,classe=myocarde [ ,1] <=100)
6 [1] -0.4640415
@freakonometrics freakonometrics freakonometrics.hypotheses.org 4
Arthur Charpentier, SIDE Summer School, July 2019
Classification : Classification Trees
if we split, define index
G(IL, IR) = −
x∈{L,R}
nx
nIx
nI
y∈{0,1}
ny,Ix
nIx
1 −
ny,Ix
nIx
the entropic measure is
E(I) = −
y∈{0,1}
ny,I
nI
log
ny,I
nI
1 entropy = function(y,classe){
2 T = table(y,classe)
3 nx = apply(T,2,sum)
4 pxy = T/matrix(rep(nx ,each =2) ,nrow =2)
5 omega = matrix(rep(nx ,each =2) ,nrow =2)/sum(T)
6 g = sum(omega*pxy*log(pxy))
7 return(g)}
@freakonometrics freakonometrics freakonometrics.hypotheses.org 5
Arthur Charpentier, SIDE Summer School, July 2019
1 mat_gini = mat_v=matrix(NA ,7 ,101)
2 for(v in 1:7){
3 variable=myocarde[,v]
4 v_seuil=seq(quantile(myocarde[,v],
5 6/length(myocarde[,v])),
6 quantile(myocarde[,v],1-6/length(
7 myocarde[,v])),length =101)
8 mat_v[v,]=v_seuil
9 for(i in 1:101){
10 CLASSE=variable <=v_seuil[i]
11 mat_gini[v,i]=
12 gini(y=myocarde$PRONO ,classe=CLASSE)}}
13 -(gini(y=myocarde$PRONO ,classe =( myocarde
[ ,3] <19))-
14 gini(y=myocarde$PRONO ,classe =( myocarde [,3]<
Inf)))/
15 gini(y=myocarde$PRONO ,classe =( myocarde [,3]<
Inf))
16 [1] 0.5862131
@freakonometrics freakonometrics freakonometrics.hypotheses.org 6
Arthur Charpentier, SIDE Summer School, July 2019
1 idx = which(myocarde$INSYS <19)
2 mat_gini = mat_v = matrix(NA ,7 ,101)
3 for(v in 1:7){
4 variable = myocarde[idx ,v]
5 v_seuil = seq(quantile(myocarde[idx ,v],
6 7/length(myocarde[idx ,v])),
7 quantile(myocarde[idx ,v],1-7/length(
8 myocarde[idx ,v])), length =101)
9 mat_v[v,] = v_seuil
10 for(i in 1:101){
11 CLASSE = variable <=v_seuil[i]
12 mat_gini[v,i]=
13 gini(y=myocarde$PRONO[idx],classe=
CLASSE)}}
14 par(mfrow=c(3 ,2))
15 for(v in 2:7){
16 plot(mat_v[v,],mat_gini[v ,])
17 }
@freakonometrics freakonometrics freakonometrics.hypotheses.org 7
Arthur Charpentier, SIDE Summer School, July 2019
1 idx = which(myocarde$INSYS >=19)
2 mat_gini = mat_v = matrix(NA ,7 ,101)
3 for(v in 1:7){
4 variable=myocarde[idx ,v]
5 v_seuil=seq(quantile(myocarde[idx ,v],
6 6/length(myocarde[idx ,v])),
7 quantile(myocarde[idx ,v],1-6/length(
8 myocarde[idx ,v])), length =101)
9 mat_v[v,]=v_seuil
10 for(i in 1:101){
11 CLASSE=variable <=v_seuil[i]
12 mat_gini[v,i]=
13 gini(y=myocarde$PRONO[idx],
14 classe=CLASSE)}}
15 par(mfrow=c(3 ,2))
16 for(v in 2:7){
17 plot(mat_v[v,],mat_gini[v ,])
18 }
@freakonometrics freakonometrics freakonometrics.hypotheses.org 8
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
Classification problem, yi ∈ {•, •}, consider a model at stage
k − 1,
if mk−1(xi) = yi, increase the weight given to observation i
Boosting : weak learner
A weak model is a model slightly better than a pure
random one (head/tails)
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5q
q
q
q
q
q
q
q
q
@freakonometrics freakonometrics freakonometrics.hypotheses.org 9
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
Classification problem, yi ∈ {•, •}, consider a model at stage
k − 1,
if mk−1(xi) = yi, increase the weight given to observation i
Boosting : weak learner
A weak model is a model slightly better than a pure
random one (head/tails)
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5
q
q
q
q
q
q
q
q
q
q
@freakonometrics freakonometrics freakonometrics.hypotheses.org 10
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
Adaboost Algorithm
1. Set weights ωi = 1/n, i = 1, · · · , n
2 . For k = 1, · · ·
(i) fit model on (yi, xi) with weights ωi, get hk(x)
(ii) compute the error rate εk =
n
i=1
˜ωi1yi=hk(xi)
(iii) compute αk = log
1 − εk
εk
(iv) reevaluate the weights ωi = ωi · eαk1yi=hk(xi)
3. The final model is hκ(x) =
κ
k≥1
αkhk(x)
The error rate should not be too small (εk ≤ 50%) to insure αk > 0
@freakonometrics freakonometrics freakonometrics.hypotheses.org 11
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
The general problem in machine learning is to find m (·) = argmin
m∈M
E (Y, g(X)
Use loss (y, g(x)) = 1y=g(x.
Empirical version is mn(·) = argmin
m∈M
1
n
n
i=1
(yi, g(xi) = argmin
m∈M
1
n
n
i=1
1yi=g(xi)
Complicated problem : use a convex version of the loss function
(y, g(x) = exp[−y · g(x)]
From Hastie et al. (2009), with the adaboost algorithm,
hκ(·) = hκ−1(·) + ακhκ(x) = hκ−1(·) + 2β H (·)
where (β , H (·)) = argmin
(β,H)∈(R,M)
n
i=1
exp − yi · (hκ−1(xi) + βH(xi)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 12
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 13
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 14
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 15
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 16
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 17
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 18
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 19
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 20
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting
Newton-Raphson to minimize a strictly convex function g : R → R
At minimum, g (x ) = 0, so consider first order approximation
g (x + h) ≈ g (x) + h · g (x)
Consider sequence xk = xk−1 − αg (xk−1) where α = [g (xk−1)]−1
One can consider a functional version of that technique, ∀i = 1, · · · , n,
gk(xi) = gk−1(xi) − α
∂ (yi, g(xi))
∂g(xi) g(xi)=gk−1(xi)
This provides a sequence of function gk at points xi.
To get values at any point x use regression i’s on xi’s,
εi = −
∂ (yi, g))
∂g g=gk−1(xi)
If α = 1 and (y, g) = exp[−yg], we have (almost) adaboost
@freakonometrics freakonometrics freakonometrics.hypotheses.org 21
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting
Gradient Boosting Algorithm
1. Start with a constant model, h0(x) = argmin
c∈R
1
n
n
i=1
(yi, c) and a regu-
larization parameter α ∈ (0, 1)
2 . For k = 1, · · ·
(i) compute εi = −
∂ (yi, g))
∂g g=gk−1(xi)
(ii) fit the (weak) model on sample (εi, xi) and let Hk denote the mode
(iii) update the model hk(·) = hk−1(·) + αHk(·)
3. The final model is hκ(x)
The choice of α is (somehow) not important : use α ∼ 10%
@freakonometrics freakonometrics freakonometrics.hypotheses.org 22
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting
The logiboost model is obtained when y ∈ {0, 1} and loss function is
(y, m) = log[1 + exp(−2(2y − 1)m)]
Boosting (learning from the mistakes)
Sequential Learning
mk(·) = mk−1(·) + α · argmin
h∈H



n
i=1
yi − mk−1(xi)
εi
, h(xi)



Hence, learning is sequential, as opposed to bagging...
@freakonometrics freakonometrics freakonometrics.hypotheses.org 23
Arthur Charpentier, SIDE Summer School, July 2019
Bagging
Gradient Boosting Algorithm
1. For k = 1, · · ·
(i) draw a bootstrap sample from (yi, xi)’s
(ii) estimate a model mk on that sample
2. The final model is m (·) =
1
κ
κ
i=1
mk(·)
To illustrate, suppose that m is some parametric model mθ.
mk = mθk
, obtained some sample Sk = {(yi, xi), i ∈ Ik}.
Let σ2
(x) = Var[mθ
(x)] and ρ(x) = Corr[mθ1
(x), mθ2
(x)] obtained on two
ramdom boostrap samples
Var[m (x)] = ρ(x)σ2
(x) +
1 − ρ(x)
κ
σ2
(x)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 24
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
We have used (y, g(x) = exp[−y · m(x)] instead of 1y=m(x.
Misclassification error is (upper) bounded by the exponential loss
1
n
n
i=1
1yi·m(xi
≤
1
n
n
i=1
exp[−yi · m(xi]
Here m(x) is a linear combination of weak classifier, m(x) =
κ
j=1
αjhj(x).
Let M = [Mi,j] where Mi,j = yi · hj(xi) ∈ {−1, +1}, i.e. Mi,j = 1 whenever
(weak) classifier j correctly classifies individual i.
yi · m(xi) =
κ
j=1
αjyihj(xi) = Mα i
thus, R(α) =
1
n
n
i=1
exp[−yi · m(xi)] =
1
n
n
i=1
exp − (Mα)i
@freakonometrics freakonometrics freakonometrics.hypotheses.org 25
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
One can use coordinate descent, in direction j in which the directional derivative
is the steepest,
j ∈ argmin −
∂R(α + aej)
∂a a=0
where the objective can be written
−
∂
∂a
1
n
n
i=1
exp − (Mα)i − a(Mej)i
a=0
=
1
n
n
i=1
Mij exp − (Mα)i
Then
j ∈ argmin (d M)j where di =
exp[−(Mα)i]
i exp[−(Mα)i]
@freakonometrics freakonometrics freakonometrics.hypotheses.org 26
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
Then do a line-search to see how far we should go. The derivative is null if
−
∂R(α + aej)
∂a
= 0 i.e. a =
1
2
log
d+
=
1
2
log
1 − d−
d−
where d− =
i:Mi,j =−1
di and d+ =
i:Mi,j =+1
di.
Coordinate Descent Algorithm
1. di = 1/n for i = 1, · · · , n and α = 0
2 . For k = 1, · · ·
(i) find optimal direction j ∈ argmin (d M)j
(ii) compute − =
i:Mi,j =−1
di and ak =
1
2
log
1 − d−
d−
(iii) set α = α + akej and di =
exp[−(Mα)i]
i exp[−(Mα)i]
@freakonometrics freakonometrics freakonometrics.hypotheses.org 27
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
very close to Adaboost : αj is the sum of ak where direction j was considered,
αj =
κ
k=1
ak1j (k)=j
Thus
m (x) =
κ
k=1
αjhj(x) =
κ
k=1
akhj (k)(x)
With Adaboost, we go in the same direction, with the same intensity : Adaboost
is equivalent to minimizing the exponential loss by coordinate descent.
Thus, we seek m (·) = argmin E(Y,X)∼F
exp (−Y · m(X))
which is minimized at m (x) =
1
2
log
P[Y = +1|X = x]
P[Y = −1|X = x]
(very close to the logistic regression)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 28
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
Several packages can be used with R, such as adabag::boosting
1 library(adabag)
2 library(caret)
3 indexes= createDataPartition (myocarde$PRONO , p=.70 , list = FALSE)
4 train = myocarde[indexes , ]
5 test = myocarde[-indexes , ]
6 model = boosting(PRONO˜., data=train , boos=TRUE , mfinal =50)
7 pred = predict(model , test)
8 print(pred$confusion)
9 Observed Class
10 Predicted Class DECES SURVIE
11 DECES 5 0
12 SURVIE 3 12
or use cross-validation
1 cvmodel = boosting.cv(PRONO˜., data=myocarde , boos=TRUE , mfinal =10, v
=5)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 29
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
or xgboost::xgboost
1 library(xgboost)
2 library(caret)
3 train_x = data.matrix(train [,-8])
4 train_y = train [,8]
5 test_x = data.matrix(test [,-8])
6 test_y = test [,8]
7 xgb_train = xgb.DMatrix(data=train_x, label=train_y)
8 xgb_test = xgb.DMatrix(data=test_x, label=test_y)
9 xgbc = xgboost(data=xgb_train , max.depth =3, nrounds =50)
10 pred = predict(xgbc , xgb_test)
11 pred_y = as.factor (( levels(test_y))[round(pred)])
12 (cm = e1071 :: confusionMatrix (test_y, pred_y))
13 Reference
14 Prediction DECES SURVIE
15 DECES 6 2
16 SURVIE 0 12
@freakonometrics freakonometrics freakonometrics.hypotheses.org 30
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
or gbm::gbm
1 library(gbm)
2 library(caret)
3 mod_gbm = gbm(PRONO =="SURVIE" ˜.,
4 data = train ,
5 distribution = "bernoulli",
6 cv.folds = 7,
7 shrinkage = .01,
8 n. minobsinnode = 10,
9 n.trees = 200)
10 pred = predict.gbm(object = mod_gbm ,
11 newdata = test ,
12 n.trees = 200,
13 type = "response")
@freakonometrics freakonometrics freakonometrics.hypotheses.org 31

Contenu connexe

Tendances (20)

Side 2019 #3
Side 2019 #3Side 2019 #3
Side 2019 #3
 
Side 2019 #6
Side 2019 #6Side 2019 #6
Side 2019 #6
 
Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
Slides econ-lm
Slides econ-lmSlides econ-lm
Slides econ-lm
 
Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
Slides univ-van-amsterdam
Slides univ-van-amsterdamSlides univ-van-amsterdam
Slides univ-van-amsterdam
 
Slides networks-2017-2
Slides networks-2017-2Slides networks-2017-2
Slides networks-2017-2
 
Slides erm-cea-ia
Slides erm-cea-iaSlides erm-cea-ia
Slides erm-cea-ia
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
 
Econometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 NonlinearitiesEconometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 Nonlinearities
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
 
Sildes buenos aires
Sildes buenos airesSildes buenos aires
Sildes buenos aires
 
Econ. Seminar Uqam
Econ. Seminar UqamEcon. Seminar Uqam
Econ. Seminar Uqam
 

Similaire à Side 2019 #5 (20)

Side 2019 #8
Side 2019 #8Side 2019 #8
Side 2019 #8
 
Classification
ClassificationClassification
Classification
 
Slides ub-3
Slides ub-3Slides ub-3
Slides ub-3
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
 
Slides ub-2
Slides ub-2Slides ub-2
Slides ub-2
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Probability QA 7
Probability QA 7Probability QA 7
Probability QA 7
 
Slides ensae-2016-9
Slides ensae-2016-9Slides ensae-2016-9
Slides ensae-2016-9
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 
Mutualisation et Segmentation
Mutualisation et SegmentationMutualisation et Segmentation
Mutualisation et Segmentation
 
Slides ub-7
Slides ub-7Slides ub-7
Slides ub-7
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Slides ensae 9
Slides ensae 9Slides ensae 9
Slides ensae 9
 
Inequality #4
Inequality #4Inequality #4
Inequality #4
 
Tutorial_2.pdf
Tutorial_2.pdfTutorial_2.pdf
Tutorial_2.pdf
 
Slides barcelona Machine Learning
Slides barcelona Machine LearningSlides barcelona Machine Learning
Slides barcelona Machine Learning
 
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
 
Slides ineq-4
Slides ineq-4Slides ineq-4
Slides ineq-4
 

Plus de Arthur Charpentier

Plus de Arthur Charpentier (11)

Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
ACT6100 introduction
ACT6100 introductionACT6100 introduction
ACT6100 introduction
 
Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)
 
Control epidemics
Control epidemics Control epidemics
Control epidemics
 
STT5100 Automne 2020, introduction
STT5100 Automne 2020, introductionSTT5100 Automne 2020, introduction
STT5100 Automne 2020, introduction
 
Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
 
Pareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQPareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQ
 

Dernier

( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...dipikadinghjn ( Why You Choose Us? ) Escorts
 
Enjoy Night⚡Call Girls Patel Nagar Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Patel Nagar Delhi >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Patel Nagar Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Patel Nagar Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...
Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...
Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...priyasharma62062
 
VIP Independent Call Girls in Taloja 🌹 9920725232 ( Call Me ) Mumbai Escorts ...
VIP Independent Call Girls in Taloja 🌹 9920725232 ( Call Me ) Mumbai Escorts ...VIP Independent Call Girls in Taloja 🌹 9920725232 ( Call Me ) Mumbai Escorts ...
VIP Independent Call Girls in Taloja 🌹 9920725232 ( Call Me ) Mumbai Escorts ...dipikadinghjn ( Why You Choose Us? ) Escorts
 
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...Call Girls in Nagpur High Profile
 
Webinar on E-Invoicing for Fintech Belgium
Webinar on E-Invoicing for Fintech BelgiumWebinar on E-Invoicing for Fintech Belgium
Webinar on E-Invoicing for Fintech BelgiumFinTech Belgium
 
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Indore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfIndore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfSaviRakhecha1
 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...ssifa0344
 
Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...
Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...
Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...priyasharma62062
 
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...dipikadinghjn ( Why You Choose Us? ) Escorts
 
VIP Independent Call Girls in Mumbai 🌹 9920725232 ( Call Me ) Mumbai Escorts ...
VIP Independent Call Girls in Mumbai 🌹 9920725232 ( Call Me ) Mumbai Escorts ...VIP Independent Call Girls in Mumbai 🌹 9920725232 ( Call Me ) Mumbai Escorts ...
VIP Independent Call Girls in Mumbai 🌹 9920725232 ( Call Me ) Mumbai Escorts ...dipikadinghjn ( Why You Choose Us? ) Escorts
 
Top Rated Pune Call Girls Lohegaon ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...
Top Rated  Pune Call Girls Lohegaon ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...Top Rated  Pune Call Girls Lohegaon ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...
Top Rated Pune Call Girls Lohegaon ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...Call Girls in Nagpur High Profile
 
Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...
Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...
Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...priyasharma62062
 
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...dipikadinghjn ( Why You Choose Us? ) Escorts
 
Top Rated Pune Call Girls Pashan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Pashan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Pashan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Pashan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Call Girls in Nagpur High Profile
 
Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...
Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...
Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...priyasharma62062
 
Stock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfStock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfMichael Silva
 

Dernier (20)

( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul 💧 7737669865 💧 by Dindigul Call G...
 
Enjoy Night⚡Call Girls Patel Nagar Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Patel Nagar Delhi >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Patel Nagar Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Patel Nagar Delhi >༒8448380779 Escort Service
 
Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...
Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...
Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...
 
VIP Independent Call Girls in Taloja 🌹 9920725232 ( Call Me ) Mumbai Escorts ...
VIP Independent Call Girls in Taloja 🌹 9920725232 ( Call Me ) Mumbai Escorts ...VIP Independent Call Girls in Taloja 🌹 9920725232 ( Call Me ) Mumbai Escorts ...
VIP Independent Call Girls in Taloja 🌹 9920725232 ( Call Me ) Mumbai Escorts ...
 
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
 
Webinar on E-Invoicing for Fintech Belgium
Webinar on E-Invoicing for Fintech BelgiumWebinar on E-Invoicing for Fintech Belgium
Webinar on E-Invoicing for Fintech Belgium
 
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
 
Indore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfIndore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdf
 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
 
Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...
Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...
Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...
 
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...
 
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
 
VIP Independent Call Girls in Mumbai 🌹 9920725232 ( Call Me ) Mumbai Escorts ...
VIP Independent Call Girls in Mumbai 🌹 9920725232 ( Call Me ) Mumbai Escorts ...VIP Independent Call Girls in Mumbai 🌹 9920725232 ( Call Me ) Mumbai Escorts ...
VIP Independent Call Girls in Mumbai 🌹 9920725232 ( Call Me ) Mumbai Escorts ...
 
Top Rated Pune Call Girls Lohegaon ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...
Top Rated  Pune Call Girls Lohegaon ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...Top Rated  Pune Call Girls Lohegaon ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...
Top Rated Pune Call Girls Lohegaon ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...
 
Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...
Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...
Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...
 
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
 
Top Rated Pune Call Girls Pashan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Pashan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Pashan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Pashan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...
Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...
Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...
 
Stock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfStock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdf
 

Side 2019 #5

  • 1. Arthur Charpentier, SIDE Summer School, July 2019 # 5 Classification & Boosting Arthur Charpentier (Universit´e du Qu´ebec `a Montr´eal) Machine Learning & Econometrics SIDE Summer School - July 2019 @freakonometrics freakonometrics freakonometrics.hypotheses.org 1
  • 2. Arthur Charpentier, SIDE Summer School, July 2019 Starting Point: Classification Tree 1 library(rpart) 2 cart = rpart(PRONO˜.,data= myocarde) 3 library(rpart.plot) 4 prp(cart ,type=2, extra =1) A (binary) split is based on one specific variable âĂŞ say xj âĂŞ and a cutoff, say s. Then, there are two options: • either xi,j ≤ s, then observation i goes on the left, in IL • or xi,j > s, then observation i goes on the right, in IR Thus, I = IL ∪ IR. @freakonometrics freakonometrics freakonometrics.hypotheses.org 2
  • 3. Arthur Charpentier, SIDE Summer School, July 2019 Classification : Classification Trees Gini for node I is defined as G(I) = − y∈{0,1} py(1 − py) where py is the proportion of individuals in the leaf of type y, G(I) = − y∈{0,1} ny,I nI 1 − ny,I nI 1 gini = function(y,classe){ 2 T. = table(y,classe) 3 nx = apply(T,2,sum) 4 n. = sum(T) 5 pxy = T/matrix(rep(nx ,each =2) ,nrow =2) 6 omega = matrix(rep(nx ,each =2) ,nrow =2)/n 7 g. = -sum(omega*pxy*(1-pxy)) 8 return(g)} @freakonometrics freakonometrics freakonometrics.hypotheses.org 3
  • 4. Arthur Charpentier, SIDE Summer School, July 2019 Classification : Classification Trees 1 -2*mean(myocarde$PRONO)*(1-mean(myocarde$PRONO)) 2 [1] -0.4832375 3 gini(y=myocarde$PRONO ,classe=myocarde$PRONO <Inf) 4 [1] -0.4832375 5 gini(y=myocarde$PRONO ,classe=myocarde [ ,1] <=100) 6 [1] -0.4640415 @freakonometrics freakonometrics freakonometrics.hypotheses.org 4
  • 5. Arthur Charpentier, SIDE Summer School, July 2019 Classification : Classification Trees if we split, define index G(IL, IR) = − x∈{L,R} nx nIx nI y∈{0,1} ny,Ix nIx 1 − ny,Ix nIx the entropic measure is E(I) = − y∈{0,1} ny,I nI log ny,I nI 1 entropy = function(y,classe){ 2 T = table(y,classe) 3 nx = apply(T,2,sum) 4 pxy = T/matrix(rep(nx ,each =2) ,nrow =2) 5 omega = matrix(rep(nx ,each =2) ,nrow =2)/sum(T) 6 g = sum(omega*pxy*log(pxy)) 7 return(g)} @freakonometrics freakonometrics freakonometrics.hypotheses.org 5
  • 6. Arthur Charpentier, SIDE Summer School, July 2019 1 mat_gini = mat_v=matrix(NA ,7 ,101) 2 for(v in 1:7){ 3 variable=myocarde[,v] 4 v_seuil=seq(quantile(myocarde[,v], 5 6/length(myocarde[,v])), 6 quantile(myocarde[,v],1-6/length( 7 myocarde[,v])),length =101) 8 mat_v[v,]=v_seuil 9 for(i in 1:101){ 10 CLASSE=variable <=v_seuil[i] 11 mat_gini[v,i]= 12 gini(y=myocarde$PRONO ,classe=CLASSE)}} 13 -(gini(y=myocarde$PRONO ,classe =( myocarde [ ,3] <19))- 14 gini(y=myocarde$PRONO ,classe =( myocarde [,3]< Inf)))/ 15 gini(y=myocarde$PRONO ,classe =( myocarde [,3]< Inf)) 16 [1] 0.5862131 @freakonometrics freakonometrics freakonometrics.hypotheses.org 6
  • 7. Arthur Charpentier, SIDE Summer School, July 2019 1 idx = which(myocarde$INSYS <19) 2 mat_gini = mat_v = matrix(NA ,7 ,101) 3 for(v in 1:7){ 4 variable = myocarde[idx ,v] 5 v_seuil = seq(quantile(myocarde[idx ,v], 6 7/length(myocarde[idx ,v])), 7 quantile(myocarde[idx ,v],1-7/length( 8 myocarde[idx ,v])), length =101) 9 mat_v[v,] = v_seuil 10 for(i in 1:101){ 11 CLASSE = variable <=v_seuil[i] 12 mat_gini[v,i]= 13 gini(y=myocarde$PRONO[idx],classe= CLASSE)}} 14 par(mfrow=c(3 ,2)) 15 for(v in 2:7){ 16 plot(mat_v[v,],mat_gini[v ,]) 17 } @freakonometrics freakonometrics freakonometrics.hypotheses.org 7
  • 8. Arthur Charpentier, SIDE Summer School, July 2019 1 idx = which(myocarde$INSYS >=19) 2 mat_gini = mat_v = matrix(NA ,7 ,101) 3 for(v in 1:7){ 4 variable=myocarde[idx ,v] 5 v_seuil=seq(quantile(myocarde[idx ,v], 6 6/length(myocarde[idx ,v])), 7 quantile(myocarde[idx ,v],1-6/length( 8 myocarde[idx ,v])), length =101) 9 mat_v[v,]=v_seuil 10 for(i in 1:101){ 11 CLASSE=variable <=v_seuil[i] 12 mat_gini[v,i]= 13 gini(y=myocarde$PRONO[idx], 14 classe=CLASSE)}} 15 par(mfrow=c(3 ,2)) 16 for(v in 2:7){ 17 plot(mat_v[v,],mat_gini[v ,]) 18 } @freakonometrics freakonometrics freakonometrics.hypotheses.org 8
  • 9. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost Classification problem, yi ∈ {•, •}, consider a model at stage k − 1, if mk−1(xi) = yi, increase the weight given to observation i Boosting : weak learner A weak model is a model slightly better than a pure random one (head/tails) q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5 q q q q q q q q q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5q q q q q q q q q @freakonometrics freakonometrics freakonometrics.hypotheses.org 9
  • 10. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost Classification problem, yi ∈ {•, •}, consider a model at stage k − 1, if mk−1(xi) = yi, increase the weight given to observation i Boosting : weak learner A weak model is a model slightly better than a pure random one (head/tails) q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5 q q q q q q q q q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5 q q q q q q q q q q @freakonometrics freakonometrics freakonometrics.hypotheses.org 10
  • 11. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost Adaboost Algorithm 1. Set weights ωi = 1/n, i = 1, · · · , n 2 . For k = 1, · · · (i) fit model on (yi, xi) with weights ωi, get hk(x) (ii) compute the error rate εk = n i=1 ˜ωi1yi=hk(xi) (iii) compute αk = log 1 − εk εk (iv) reevaluate the weights ωi = ωi · eαk1yi=hk(xi) 3. The final model is hκ(x) = κ k≥1 αkhk(x) The error rate should not be too small (εk ≤ 50%) to insure αk > 0 @freakonometrics freakonometrics freakonometrics.hypotheses.org 11
  • 12. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost The general problem in machine learning is to find m (·) = argmin m∈M E (Y, g(X) Use loss (y, g(x)) = 1y=g(x. Empirical version is mn(·) = argmin m∈M 1 n n i=1 (yi, g(xi) = argmin m∈M 1 n n i=1 1yi=g(xi) Complicated problem : use a convex version of the loss function (y, g(x) = exp[−y · g(x)] From Hastie et al. (2009), with the adaboost algorithm, hκ(·) = hκ−1(·) + ακhκ(x) = hκ−1(·) + 2β H (·) where (β , H (·)) = argmin (β,H)∈(R,M) n i=1 exp − yi · (hκ−1(xi) + βH(xi) @freakonometrics freakonometrics freakonometrics.hypotheses.org 12
  • 13. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 13
  • 14. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 14
  • 15. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 15
  • 16. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 16
  • 17. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 17
  • 18. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 18
  • 19. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 19
  • 20. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 20
  • 21. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting Newton-Raphson to minimize a strictly convex function g : R → R At minimum, g (x ) = 0, so consider first order approximation g (x + h) ≈ g (x) + h · g (x) Consider sequence xk = xk−1 − αg (xk−1) where α = [g (xk−1)]−1 One can consider a functional version of that technique, ∀i = 1, · · · , n, gk(xi) = gk−1(xi) − α ∂ (yi, g(xi)) ∂g(xi) g(xi)=gk−1(xi) This provides a sequence of function gk at points xi. To get values at any point x use regression i’s on xi’s, εi = − ∂ (yi, g)) ∂g g=gk−1(xi) If α = 1 and (y, g) = exp[−yg], we have (almost) adaboost @freakonometrics freakonometrics freakonometrics.hypotheses.org 21
  • 22. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting Gradient Boosting Algorithm 1. Start with a constant model, h0(x) = argmin c∈R 1 n n i=1 (yi, c) and a regu- larization parameter α ∈ (0, 1) 2 . For k = 1, · · · (i) compute εi = − ∂ (yi, g)) ∂g g=gk−1(xi) (ii) fit the (weak) model on sample (εi, xi) and let Hk denote the mode (iii) update the model hk(·) = hk−1(·) + αHk(·) 3. The final model is hκ(x) The choice of α is (somehow) not important : use α ∼ 10% @freakonometrics freakonometrics freakonometrics.hypotheses.org 22
  • 23. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting The logiboost model is obtained when y ∈ {0, 1} and loss function is (y, m) = log[1 + exp(−2(2y − 1)m)] Boosting (learning from the mistakes) Sequential Learning mk(·) = mk−1(·) + α · argmin h∈H    n i=1 yi − mk−1(xi) εi , h(xi)    Hence, learning is sequential, as opposed to bagging... @freakonometrics freakonometrics freakonometrics.hypotheses.org 23
  • 24. Arthur Charpentier, SIDE Summer School, July 2019 Bagging Gradient Boosting Algorithm 1. For k = 1, · · · (i) draw a bootstrap sample from (yi, xi)’s (ii) estimate a model mk on that sample 2. The final model is m (·) = 1 κ κ i=1 mk(·) To illustrate, suppose that m is some parametric model mθ. mk = mθk , obtained some sample Sk = {(yi, xi), i ∈ Ik}. Let σ2 (x) = Var[mθ (x)] and ρ(x) = Corr[mθ1 (x), mθ2 (x)] obtained on two ramdom boostrap samples Var[m (x)] = ρ(x)σ2 (x) + 1 − ρ(x) κ σ2 (x) @freakonometrics freakonometrics freakonometrics.hypotheses.org 24
  • 25. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues We have used (y, g(x) = exp[−y · m(x)] instead of 1y=m(x. Misclassification error is (upper) bounded by the exponential loss 1 n n i=1 1yi·m(xi ≤ 1 n n i=1 exp[−yi · m(xi] Here m(x) is a linear combination of weak classifier, m(x) = κ j=1 αjhj(x). Let M = [Mi,j] where Mi,j = yi · hj(xi) ∈ {−1, +1}, i.e. Mi,j = 1 whenever (weak) classifier j correctly classifies individual i. yi · m(xi) = κ j=1 αjyihj(xi) = Mα i thus, R(α) = 1 n n i=1 exp[−yi · m(xi)] = 1 n n i=1 exp − (Mα)i @freakonometrics freakonometrics freakonometrics.hypotheses.org 25
  • 26. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues One can use coordinate descent, in direction j in which the directional derivative is the steepest, j ∈ argmin − ∂R(α + aej) ∂a a=0 where the objective can be written − ∂ ∂a 1 n n i=1 exp − (Mα)i − a(Mej)i a=0 = 1 n n i=1 Mij exp − (Mα)i Then j ∈ argmin (d M)j where di = exp[−(Mα)i] i exp[−(Mα)i] @freakonometrics freakonometrics freakonometrics.hypotheses.org 26
  • 27. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues Then do a line-search to see how far we should go. The derivative is null if − ∂R(α + aej) ∂a = 0 i.e. a = 1 2 log d+ = 1 2 log 1 − d− d− where d− = i:Mi,j =−1 di and d+ = i:Mi,j =+1 di. Coordinate Descent Algorithm 1. di = 1/n for i = 1, · · · , n and α = 0 2 . For k = 1, · · · (i) find optimal direction j ∈ argmin (d M)j (ii) compute − = i:Mi,j =−1 di and ak = 1 2 log 1 − d− d− (iii) set α = α + akej and di = exp[−(Mα)i] i exp[−(Mα)i] @freakonometrics freakonometrics freakonometrics.hypotheses.org 27
  • 28. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues very close to Adaboost : αj is the sum of ak where direction j was considered, αj = κ k=1 ak1j (k)=j Thus m (x) = κ k=1 αjhj(x) = κ k=1 akhj (k)(x) With Adaboost, we go in the same direction, with the same intensity : Adaboost is equivalent to minimizing the exponential loss by coordinate descent. Thus, we seek m (·) = argmin E(Y,X)∼F exp (−Y · m(X)) which is minimized at m (x) = 1 2 log P[Y = +1|X = x] P[Y = −1|X = x] (very close to the logistic regression) @freakonometrics freakonometrics freakonometrics.hypotheses.org 28
  • 29. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues Several packages can be used with R, such as adabag::boosting 1 library(adabag) 2 library(caret) 3 indexes= createDataPartition (myocarde$PRONO , p=.70 , list = FALSE) 4 train = myocarde[indexes , ] 5 test = myocarde[-indexes , ] 6 model = boosting(PRONO˜., data=train , boos=TRUE , mfinal =50) 7 pred = predict(model , test) 8 print(pred$confusion) 9 Observed Class 10 Predicted Class DECES SURVIE 11 DECES 5 0 12 SURVIE 3 12 or use cross-validation 1 cvmodel = boosting.cv(PRONO˜., data=myocarde , boos=TRUE , mfinal =10, v =5) @freakonometrics freakonometrics freakonometrics.hypotheses.org 29
  • 30. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues or xgboost::xgboost 1 library(xgboost) 2 library(caret) 3 train_x = data.matrix(train [,-8]) 4 train_y = train [,8] 5 test_x = data.matrix(test [,-8]) 6 test_y = test [,8] 7 xgb_train = xgb.DMatrix(data=train_x, label=train_y) 8 xgb_test = xgb.DMatrix(data=test_x, label=test_y) 9 xgbc = xgboost(data=xgb_train , max.depth =3, nrounds =50) 10 pred = predict(xgbc , xgb_test) 11 pred_y = as.factor (( levels(test_y))[round(pred)]) 12 (cm = e1071 :: confusionMatrix (test_y, pred_y)) 13 Reference 14 Prediction DECES SURVIE 15 DECES 6 2 16 SURVIE 0 12 @freakonometrics freakonometrics freakonometrics.hypotheses.org 30
  • 31. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues or gbm::gbm 1 library(gbm) 2 library(caret) 3 mod_gbm = gbm(PRONO =="SURVIE" ˜., 4 data = train , 5 distribution = "bernoulli", 6 cv.folds = 7, 7 shrinkage = .01, 8 n. minobsinnode = 10, 9 n.trees = 200) 10 pred = predict.gbm(object = mod_gbm , 11 newdata = test , 12 n.trees = 200, 13 type = "response") @freakonometrics freakonometrics freakonometrics.hypotheses.org 31