SlideShare une entreprise Scribd logo
1  sur  69
Télécharger pour lire hors ligne
Classification and regression based on derivatives: a
consistency result for sampled functions
Nathalie Villa-Vialaneix
http://www.nathalievilla.org
Toulouse School of Economics
TSE Seminar
April 29, 2009
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 1 / 25
Research interests
Graph mining (& social networks) Modelling land use evolution
Functional data analysis
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 2 / 25
Research interests
Graph mining (& social networks) Modelling land use evolution
Functional data analysis
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 2 / 25
Research interests
Graph mining (& social networks) Modelling land use evolution
Functional data analysis
Based on a preprint (available on demand) written with Fabrice Rossi. A
short version is given on TSE web site.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 2 / 25
Outline
1 Introduction and motivations
2 A general consistency result
3 Examples
4 Conclusion and prospect
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 3 / 25
Introduction and motivations
A regression or classification where the predictor takes
values in an infinite dimensional space
Settings
(X, Y) is a random pair of variables where
• Y ∈ {−1, 1} (binary classification problem) or Y ∈ R
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25
Introduction and motivations
A regression or classification where the predictor takes
values in an infinite dimensional space
Settings
(X, Y) is a random pair of variables where
• Y ∈ {−1, 1} (binary classification problem) or Y ∈ R
• X ∈ (X, ., . X), an infinite dimensional Hilbert space.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25
Introduction and motivations
A regression or classification where the predictor takes
values in an infinite dimensional space
Settings
(X, Y) is a random pair of variables where
• Y ∈ {−1, 1} (binary classification problem) or Y ∈ R
• X ∈ (X, ., . X), an infinite dimensional Hilbert space.
We are given a learning set Sn = {(Xi, Yi)}n
i=1
of n i.i.d. copies of (X, Y).
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25
Introduction and motivations
A regression or classification where the predictor takes
values in an infinite dimensional space
Settings
(X, Y) is a random pair of variables where
• Y ∈ {−1, 1} (binary classification problem) or Y ∈ R
• X ∈ (X, ., . X), an infinite dimensional Hilbert space.
We are given a learning set Sn = {(Xi, Yi)}n
i=1
of n i.i.d. copies of (X, Y).
Purpose: Find φn : X → {−1, 1} or R, that is universally consistent:
• Classification case: limn→+∞ P (φn(X) Y) = L∗ where
L∗ = infφ:X→{−1,1} P (φ(X) Y) is the Bayes risk.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25
Introduction and motivations
A regression or classification where the predictor takes
values in an infinite dimensional space
Settings
(X, Y) is a random pair of variables where
• Y ∈ {−1, 1} (binary classification problem) or Y ∈ R
• X ∈ (X, ., . X), an infinite dimensional Hilbert space.
We are given a learning set Sn = {(Xi, Yi)}n
i=1
of n i.i.d. copies of (X, Y).
Purpose: Find φn : X → {−1, 1} or R, that is universally consistent:
• Classification case: limn→+∞ P (φn(X) Y) = L∗ where
L∗ = infφ:X→{−1,1} P (φ(X) Y) is the Bayes risk.
• Regression case: limn→+∞ E [φn(X) − Y]2
= L∗ where
L∗ = infφ:X→R E [φ(X) − Y]2
will also be called the Bayes risk.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25
Introduction and motivations
An example
Predicting the rate of yellow berry in durum wheat from its NIR
spectrum.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 5 / 25
Introduction and motivations
Using derivatives
Practically, X(m) is often more relevant than X for the prediction.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 6 / 25
Introduction and motivations
Using derivatives
Practically, X(m) is often more relevant than X for the prediction.
But X → X(m) induces information loss and
inf
φ:DmX→{−1,1}
P φ(X(m)
) Y ≥ inf
φ:X→{−1,1}
P (φ(X) Y) = L∗
and
inf
φ:DmX→R
E φ(X(m)
) − Y
2
≥ inf
φ:X→R
P [φ(X) − Y]2
= L∗
.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 6 / 25
Introduction and motivations
Sampled functions
Practically, (Xi)i are not perfectly known but we are given a discrete
sampling of them: Xτd
i
= (Xi(t))t∈τd
where τd = {tτd
1
, . . . , tτd
|τd |
}.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 7 / 25
Introduction and motivations
Sampled functions
Practically, (Xi)i are not perfectly known but we are given a discrete
sampling of them: Xτd
i
= (Xi(t))t∈τd
where τd = {tτd
1
, . . . , tτd
|τd |
}.
The sampling can be noisy.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 7 / 25
Introduction and motivations
Sampled functions
Practically, (Xi)i are not perfectly known but we are given a discrete
sampling of them: Xτd
i
= (Xi(t))t∈τd
where τd = {tτd
1
, . . . , tτd
|τd |
}.
Then, X
(m)
i
is estimated from Xτd
i
and if the estimation is denoted by
X
(m)
τd
, this also induces information loss:
inf
φ:DmX→{−1,1}
P φ(X
(m)
τd
) Y ≥ inf
φ:DmX→{−1,1}
P φ(X(m)
) Y ≥ L∗
and
inf
φ:DmX→R
E φ(X
(m)
τd
) − Y
2
≥ inf
φ:DmX→R
E φ(X(m)
) − Y
2
≥ L∗
.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 7 / 25
Introduction and motivations
Purpose of the presentation
Find a classifier or a regression function φn,τd
built from X
(m)
τd
such that the
risk of φn,τd
asymptotically reaches the Bayes risk L∗:
lim
|τd |→+∞
lim
n→+∞
P φn,τd
(X
(m)
τd
) Y = L∗
or
lim
|τd |→+∞
lim
n→+∞
E φn,τd
(X
(m)
τd
) − Y
2
= L∗
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 8 / 25
Introduction and motivations
Purpose of the presentation
Find a classifier or a regression function φn,τd
built from X
(m)
τd
such that the
risk of φn,τd
asymptotically reaches the Bayes risk L∗:
lim
|τd |→+∞
lim
n→+∞
P φn,τd
(X
(m)
τd
) Y = L∗
or
lim
|τd |→+∞
lim
n→+∞
E φn,τd
(X
(m)
τd
) − Y
2
= L∗
Main idea: Use a relevant way to estimate X(m) from Xτd (by smoothing
splines) and combine the consistency of splines with the consistency of a
R|τd |-classifier or regression function.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 8 / 25
A general consistency result
Basics about smoothing splines I
Suppose that X is the Sobolev space
Hm
= h ∈ L2
[0,1]|∀ j = 1, . . . , m, Dj
h exists in the weak sense and Dm
h ∈ L2
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 9 / 25
A general consistency result
Basics about smoothing splines I
Suppose that X is the Sobolev space
Hm
= h ∈ L2
[0,1]|∀ j = 1, . . . , m, Dj
h exists in the weak sense and Dm
h ∈ L2
equipped with the scalar product
u, v Hm = Dm
u, Dm
v L2 +
m
j=1
Bj
uBj
v
where B are m boundary conditions such that KerB ∩ Pm−1
= {0}.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 9 / 25
A general consistency result
Basics about smoothing splines I
Suppose that X is the Sobolev space
Hm
= h ∈ L2
[0,1]|∀ j = 1, . . . , m, Dj
h exists in the weak sense and Dm
h ∈ L2
equipped with the scalar product
u, v Hm = Dm
u, Dm
v L2 +
m
j=1
Bj
uBj
v
where B are m boundary conditions such that KerB ∩ Pm−1
= {0}.
(Hm
, ., . Hm ) is a RKHS: there is k0 : Pm−1
× Pm−1
→ R and
k1 : KerB × KerB → R such that
∀ u ∈ Pm−1
, t ∈ [0, 1], u, k0(t, .) Hm = u(t)
and
∀ u ∈ KerB, t ∈ [0, 1], u, k1(t, .) Hm = u(t)
See [Berlinet and Thomas-Agnan, 2004] for further details.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 9 / 25
A general consistency result
Basics about smoothing splines II
A simple example of boundary conditions:
h(0) = h(1)
(0) = . . . = h(m−1)
(0) = 0.
Then,
k0(s, t) =
m−1
k=0
tk
sk
(k!)2
and
k1(s, t) =
1
0
(t − w)m−1
+ (s − w)m−1
+
(m − 1)!
dw.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 10 / 25
A general consistency result
Estimating the predictors with smoothing splines I
Assumption (A1)
• |τd| ≥ m − 1
• sampling points are distinct in [0, 1]
• Bj
are linearly independent from h → h(t) for all t ∈ τd
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 11 / 25
A general consistency result
Estimating the predictors with smoothing splines I
Assumption (A1)
• |τd| ≥ m − 1
• sampling points are distinct in [0, 1]
• Bj
are linearly independent from h → h(t) for all t ∈ τd
[Kimeldorf and Wahba, 1971]: for xτd in R|τd |, there is a unique
ˆxλ,τd
∈ Hm
solution of
arg min
h∈Hm
1
|τd|
|τd |
l)1
(h(tl) − xτd
)2
+ λ
[0,1]
(h(m)
(t))2
dt.
and ˆxλ,τd
= Sλ,τd
xτd where Sλ,τd
is a full rank operator from R|τd | to Hm
.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 11 / 25
A general consistency result
Estimating the predictors with smoothing splines I
Assumption (A1)
• |τd| ≥ m − 1
• sampling points are distinct in [0, 1]
• Bj
are linearly independent from h → h(t) for all t ∈ τd
[Kimeldorf and Wahba, 1971]: for xτd in R|τd |, there is a unique
ˆxλ,τd
∈ Hm
solution of
arg min
h∈Hm
1
|τd|
|τd |
l)1
(h(tl) − xτd
)2
+ λ
[0,1]
(h(m)
(t))2
dt.
and ˆxλ,τd
= Sλ,τd
xτd where Sλ,τd
is a full rank operator from R|τd | to Hm
.
These assumptions are fullfilled by the previous simple example as long
as 0 τd.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 11 / 25
A general consistency result
Estimating the predictors with smoothing splines II
S, τ is given by:
Sλ,τd
= ωT
(U(K1 + λI|τd |)UT
)−1
U(K1 + λI|τd |)−1
+ηT
(K1 + λI|τd |)−1
(I|τd | − UT
(U(K1 + λI|τd |)−1
U(K1 + λI|τd |)−1
)
= ωT
M0 + ηT
M1
with
• {ω1, . . . , ωm} is a basis of Pm−1
, ω = (ω1, . . . , ωm)T
and
U = (ωi(t))i=1,...,m t∈τd
;
• η = (k1(t, .))T
t∈τd
and K1 = (k1(t, t ))t,t ∈τd
.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 12 / 25
A general consistency result
Estimating the predictors with smoothing splines II
S, τ is given by:
Sλ,τd
= ωT
(U(K1 + λI|τd |)UT
)−1
U(K1 + λI|τd |)−1
+ηT
(K1 + λI|τd |)−1
(I|τd | − UT
(U(K1 + λI|τd |)−1
U(K1 + λI|τd |)−1
)
= ωT
M0 + ηT
M1
with
• {ω1, . . . , ωm} is a basis of Pm−1
, ω = (ω1, . . . , ωm)T
and
U = (ωi(t))i=1,...,m t∈τd
;
• η = (k1(t, .))T
t∈τd
and K1 = (k1(t, t ))t,t ∈τd
.
The observations of the predictor X (NIR spectra) are then estimated
from their sampling Xτd by Xλ,τd
.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 12 / 25
A general consistency result
Two important consequences
1 No information loss
inf
φ:Hm→{−1,1}
P φ(Xλ,τd
) Y = inf
φ:R|τd |→{−1,1}
P (φ(Xτd
) Y)
and
inf
φ:Hm→{−1,1}
E φ(Xλ,τd
) − Y
2
= inf
φ:R|τd |→{−1,1}
P [φ(Xτd
) − Y]2
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
A general consistency result
Two important consequences
1 No information loss
inf
φ:Hm→{−1,1}
P φ(Xλ,τd
) Y = inf
φ:R|τd |→{−1,1}
P (φ(Xτd
) Y)
and
inf
φ:Hm→{−1,1}
E φ(Xλ,τd
) − Y
2
= inf
φ:R|τd |→{−1,1}
P [φ(Xτd
) − Y]2
2 Easy way to use derivatives:
Sλ,τd
uτd
, Sλ,τd
vτd
Hm = uλ,τd
, vλ,τd Hm
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
A general consistency result
Two important consequences
1 No information loss
inf
φ:Hm→{−1,1}
P φ(Xλ,τd
) Y = inf
φ:R|τd |→{−1,1}
P (φ(Xτd
) Y)
and
inf
φ:Hm→{−1,1}
E φ(Xλ,τd
) − Y
2
= inf
φ:R|τd |→{−1,1}
P [φ(Xτd
) − Y]2
2 Easy way to use derivatives:
(uτd
)T
MT
0 WM0vτd
+ (uτd
)T
MT
1 K1M1vτd
= uλ,τd
, vλ,τd Hm
where K1, M0 and M1 have been previously defined and
W = ( ωi, ωj Hm )i,j=1,...,m.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
A general consistency result
Two important consequences
1 No information loss
inf
φ:Hm→{−1,1}
P φ(Xλ,τd
) Y = inf
φ:R|τd |→{−1,1}
P (φ(Xτd
) Y)
and
inf
φ:Hm→{−1,1}
E φ(Xλ,τd
) − Y
2
= inf
φ:R|τd |→{−1,1}
P [φ(Xτd
) − Y]2
2 Easy way to use derivatives:
(uτd
)T
Mλ,τd
vτd
= uλ,τd
, vλ,τd Hm
where Mλ,τd
is symmetric, definite positive.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
A general consistency result
Two important consequences
1 No information loss
inf
φ:Hm→{−1,1}
P φ(Xλ,τd
) Y = inf
φ:R|τd |→{−1,1}
P (φ(Xτd
) Y)
and
inf
φ:Hm→{−1,1}
E φ(Xλ,τd
) − Y
2
= inf
φ:R|τd |→{−1,1}
P [φ(Xτd
) − Y]2
2 Easy way to use derivatives:
(Qλ,τd
uτd
)T
(Qλ,τd
vτd
) = uλ,τd
, vλ,τd Hm
where Qλ,τd
is the Choleski triangle of Mλ,τd
: QT
λ,τd
Qλ,τd
= Mλ,τd
.
Remark: Qλ,τd
is calculated only from the RKHS, λ and τd: it does not
depend on the data set.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
A general consistency result
Two important consequences
1 No information loss
inf
φ:Hm→{−1,1}
P φ(Xλ,τd
) Y = inf
φ:R|τd |→{−1,1}
P (φ(Xτd
) Y)
and
inf
φ:Hm→{−1,1}
E φ(Xλ,τd
) − Y
2
= inf
φ:R|τd |→{−1,1}
P [φ(Xτd
) − Y]2
2 Easy way to use derivatives:
(Qλ,τd
uτd
)T
(Qλ,τd
vτd
) = uλ,τd
, vλ,τd Hm
u
(m)
λ,τd
, v
(m)
λ,τd L2
where Qλ,τd
is the Choleski triangle of Mλ,τd
: QT
λ,τd
Qλ,τd
= Mλ,τd
.
Remark: Qλ,τd
is calculated only from the RKHS, λ and τd: it does not
depend on the data set.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
A general consistency result
Classification and regression based on derivatives
Suppose that we know a consistent classifier or regression function in
R|τd | that is based on R|τd | scalar product or norm.
Example: Nonparametric kernel regression
Ψ : u ∈ R|τd |
→
n
i=1 TiK
u−Ui R|τd |
hn
n
i=1 K
u−Ui R|τd |
hn
where (Ui, Ti)i=1,...,n is a learning set in R|τd | × R.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25
A general consistency result
Classification and regression based on derivatives
Suppose that we know a consistent classifier or regression function in
R|τd | that is based on R|τd | scalar product or norm.
The corresponding derivative based classifier or regression function
is given by using the norm induced by Qλ,τd
:
Example: Nonparametric kernel regression
Ψ : u ∈ R|τd |
→
n
i=1 TiK
u−Ui R|τd |
hn
n
i=1 K
u−Ui R|τd |
hn
where (Ui, Ti)i=1,...,n is a learning set in R|τd | × R.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25
A general consistency result
Classification and regression based on derivatives
Suppose that we know a consistent classifier or regression function in
R|τd | that is based on R|τd | scalar product or norm.
The corresponding derivative based classifier or regression function
is given by using the norm induced by Qλ,τd
:
Example: Nonparametric kernel regression
φn,d = Ψ ◦ Qλ,τd
: x ∈ Hm
→
n
i=1 YiK
Qλ,τd
xτd −Qλ,τd
X
τd
i R|τd |
hn
n
i=1 K
Qλ,τd
xτd −Qλ,τd
X
τd
i R|τd |
hn
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25
A general consistency result
Classification and regression based on derivatives
Suppose that we know a consistent classifier or regression function in
R|τd | that is based on R|τd | scalar product or norm.
The corresponding derivative based classifier or regression function
is given by using the norm induced by Qλ,τd
:
Example: Nonparametric kernel regression
φn,d = Ψ ◦ Qλ,τd
: x ∈ Hm
→
n
i=1 YiK
Qλ,τd
xτd −Qλ,τd
X
τd
i R|τd |
hn
n
i=1 K
Qλ,τd
xτd −Qλ,τd
X
τd
i R|τd |
hn
−→
n
i=1 YiK
x(m)−X
(m)
i L2
hn
n
i=1 K
x(m)−X
(m)
i L2
hn
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25
A general consistency result
Remark for consistency
Classification case (approximatively the same is true for regression
case):
P φn,τd
(Xλ,τd
) Y − L∗
= P φn,τd
(Xλ,τd
) Y − L∗
d + L∗
d − L∗
where L∗
d
= infφ:R|τd |→{−1,1} P (φ(Xτd ) Y).
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25
A general consistency result
Remark for consistency
Classification case (approximatively the same is true for regression
case):
P φn,τd
(Xλ,τd
) Y − L∗
= P φn,τd
(Xλ,τd
) Y − L∗
d + L∗
d − L∗
where L∗
d
= infφ:R|τd |→{−1,1} P (φ(Xτd ) Y).
1 For all fixed d,
lim
n→+∞
P φn,τd
(Xλ,τd
) Y = L∗
d
as long as the R|τd |-classifier is consistent because there is a
one-to-one mapping between Xτd and Xλ,τd
.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25
A general consistency result
Remark for consistency
Classification case (approximatively the same is true for regression
case):
P φn,τd
(Xλ,τd
) Y − L∗
= P φn,τd
(Xλ,τd
) Y − L∗
d + L∗
d − L∗
where L∗
d
= infφ:R|τd |→{−1,1} P (φ(Xτd ) Y).
1 For all fixed d,
lim
n→+∞
P φn,τd
(Xλ,τd
) Y = L∗
d
as long as the R|τd |-classifier is consistent because there is a
one-to-one mapping between Xτd and Xλ,τd
.
2
L∗
d − L∗
≤ E E(Y|Xλ,τd
) − E(Y|X)
with consistency of spline estimate Xλ,τd
and assumption on the
regularity of E(Y|X = .), consistency would be proved.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25
A general consistency result
Remark for consistency
Classification case (approximatively the same is true for regression
case):
P φn,τd
(Xλ,τd
) Y − L∗
= P φn,τd
(Xλ,τd
) Y − L∗
d + L∗
d − L∗
where L∗
d
= infφ:R|τd |→{−1,1} P (φ(Xτd ) Y).
1 For all fixed d,
lim
n→+∞
P φn,τd
(Xλ,τd
) Y = L∗
d
as long as the R|τd |-classifier is consistent because there is a
one-to-one mapping between Xτd and Xλ,τd
.
2
L∗
d − L∗
≤ E E(Y|Xλ,τd
) − E(Y|X)
with consistency of spline estimate Xλ,τd
and assumption on the
regularity of E(Y|X = .), consistency would be proved.
But continuity of E(Y|X = .) is a strong assumption in infinite
dimensional case, and is not easy to check.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25
A general consistency result
Spline consistency
Let λ depends on d and denote (λd)d the series of regularization
parameters. Also introduce
∆τd
:= max{t1, t2 − t1, . . . , 1 − t|τd |}, ∆τd
:= min
1≤i<|τd |
{ti+1 − ti}
Assumption (A2)
• There is R such that ∆τd
/∆τd
≤ R for all d;
• limd→+∞ |τd| = +∞;
• limd→+∞ λd = 0.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 16 / 25
A general consistency result
Spline consistency
Let λ depends on d and denote (λd)d the series of regularization
parameters. Also introduce
∆τd
:= max{t1, t2 − t1, . . . , 1 − t|τd |}, ∆τd
:= min
1≤i<|τd |
{ti+1 − ti}
Assumption (A2)
• There is R such that ∆τd
/∆τd
≤ R for all d;
• limd→+∞ |τd| = +∞;
• limd→+∞ λd = 0.
[Ragozin, 1983]: Under (A1) and (A2), ∃AR,m and BR,m such that for any
x ∈ Hm
and any λd > 0,
ˆxλd ,τd
− x
2
L2 ≤ AR,mλd + BR,m
1
|τd|2m
Dm
x 2
L2
d→+∞
−−−−−−→ 0
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 16 / 25
A general consistency result
Bayes risk consistency
Assumption (A3a)
E Dm
X 2
L2 is finite and Y ∈ {−1, 1}.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 17 / 25
A general consistency result
Bayes risk consistency
Assumption (A3a)
E Dm
X 2
L2 is finite and Y ∈ {−1, 1}.
or
Assumption (A3b)
τd ⊂ τd+1 for all d and E(Y2
) is finite.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 17 / 25
A general consistency result
Bayes risk consistency
Assumption (A3a)
E Dm
X 2
L2 is finite and Y ∈ {−1, 1}.
or
Assumption (A3b)
τd ⊂ τd+1 for all d and E(Y2
) is finite.
Under (A1)-(A3), limd→+∞ L∗
d
= L∗.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 17 / 25
A general consistency result
Proof for assumption (A3a)
Assumption (A3a)
E Dm
X 2
L2 is finite and Y ∈ {−1, 1}.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 18 / 25
A general consistency result
Proof for assumption (A3a)
Assumption (A3a)
E Dm
X 2
L2 is finite and Y ∈ {−1, 1}.
The proof is based on a result of [Faragó and Györfi, 1975]:
For a pair of random variables (X, Y) taking their values in
X×{−1, 1} where X is an arbitrary metric space and for a series
of functions Td : X → X such that
E(δ(Td(X), X))
d→+∞
−−−−−−→ 0
then limd→+∞ infφ:X→{−1,1} P(φ(Td(X)) Y) = L∗.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 18 / 25
A general consistency result
Proof for assumption (A3a)
Assumption (A3a)
E Dm
X 2
L2 is finite and Y ∈ {−1, 1}.
The proof is based on a result of [Faragó and Györfi, 1975]:
For a pair of random variables (X, Y) taking their values in
X×{−1, 1} where X is an arbitrary metric space and for a series
of functions Td : X → X such that
E(δ(Td(X), X))
d→+∞
−−−−−−→ 0
then limd→+∞ infφ:X→{−1,1} P(φ(Td(X)) Y) = L∗.
Using for Td the spline estimate and the previous inequality of
[Ragozin, 1983] about this estimates give the assumption. Then the result
follows.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 18 / 25
A general consistency result
Proof for assumption (A3b)
Assumption (A3b)
τd ⊂ τd+1 for all d and E(Y2
) is finite.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 19 / 25
A general consistency result
Proof for assumption (A3b)
Assumption (A3b)
τd ⊂ τd+1 for all d and E(Y2
) is finite.
Under (A3b), (E(Y|Xλd ,τd
))d is a uniformly bounded martingale and thus
converge for the L1
-norm. Using the consistency of (Xλd ,τd
)d to X
completes the proof.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 19 / 25
A general consistency result
Concluding result
Theorem
Under assumptions (A1)-(A3),
lim
d→+∞
lim
n→+∞
P φn,τd
(Xλd ,τd
) Y = L∗
and
lim
|τd |→+∞
lim
n→+∞
E φn,τd
(Xλd ,τd
) − Y
2
= L∗
Proof: For a > 0, fix d0 such that, for all d ≥ d0, L∗
d
− L∗ ≤ /2.
Then, by consistency of the R|τd |-classifier or regression function,
conclude.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 20 / 25
A general consistency result
A practical application to SVM I
Recall that, for a learning set (Ui, Ti)i=1,...,n in Rp
× {−1, 1}, gaussian SVM
is the classifier
u ∈ Rp
→ Sign


n
i=1
αiTie−γ u−Ui
2
Rp


where (αi)i satisfy the following quadratic optimization problem:
arg min
w
n
i=1
1 − Tiw(Ui) +
+ C w 2
S
where w(u) = n
i=1 αie−γ u−Ui
2
Rp
and S is the RKHS associated with the
gaussian kernel and C is a regularization parameter.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 21 / 25
A general consistency result
A practical application to SVM I
Recall that, for a learning set (Ui, Ti)i=1,...,n in Rp
× {−1, 1}, gaussian SVM
is the classifier
u ∈ Rp
→ Sign


n
i=1
αiTie−γ u−Ui
2
Rp


where (αi)i satisfy the following quadratic optimization problem:
arg min
w
n
i=1
1 − Tiw(Ui) +
+ C w 2
S
where w(u) = n
i=1 αie−γ u−Ui
2
Rp
and S is the RKHS associated with the
gaussian kernel and C is a regularization parameter.
Under suitable assumptions, [Steinwart, 2002] proves the consistency of
SVM classifiers.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 21 / 25
A general consistency result
A practical application to SVM II
Additional assumptions related to SVM: Assumptions (A4)
• For all d, the regularization parameter depends on n such that
limn→+∞ nCd
n = +∞ and Cd
n = On nβd −1
for a 0 < βd < 1/d.
• For all d, there is a bounded subset of R|τd |, Bd, such that Xτd belongs
to Bd.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 22 / 25
A general consistency result
A practical application to SVM II
Additional assumptions related to SVM: Assumptions (A4)
• For all d, the regularization parameter depends on n such that
limn→+∞ nCd
n = +∞ and Cd
n = On nβd −1
for a 0 < βd < 1/d.
• For all d, there is a bounded subset of R|τd |, Bd, such that Xτd belongs
to Bd.
Result: Under assumptions (A1)-(A4), the SVM φn,d : x ∈ Hm
→
Sign


n
i=1
αiYie
−γ Qλd ,τd
xτd −Qλd ,τd
X
τd
i
2
Rd

 Sign


n
i=1
αiYie
−γ x(m)−X
(m)
i
|2
L2


is consistent: limd→+∞ limn→+∞ P φn,τd
(Xλd ,τd
) Y = L∗.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 22 / 25
Examples
Example 1: Classification with SVM
Predicted fat content class of peaces of meat from NIR spectra:
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
Examples
Example 1: Classification with SVM
Predicted fat content class of peaces of meat from NIR spectra:
Method:
1 Split the data set into training sample and test sample;
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
Examples
Example 1: Classification with SVM
Predicted fat content class of peaces of meat from NIR spectra:
Method:
1 Split the data set into training sample and test sample;
2 Estimate λd by leave-one-out on the Xτd
i
of the training sample. The
Sobolev space is the one with boundary conditions
h(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2).
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
Examples
Example 1: Classification with SVM
Predicted fat content class of peaces of meat from NIR spectra:
Method:
1 Split the data set into training sample and test sample;
2 Estimate λd by leave-one-out on the Xτd
i
of the training sample. The
Sobolev space is the one with boundary conditions
h(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2).
3 Calculate Qλd ,τd
.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
Examples
Example 1: Classification with SVM
Predicted fat content class of peaces of meat from NIR spectra:
Method:
1 Split the data set into training sample and test sample;
2 Estimate λd by leave-one-out on the Xτd
i
of the training sample. The
Sobolev space is the one with boundary conditions
h(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2).
3 Calculate Qλd ,τd
.
4 Calculate Gaussian and Linear SVM classifiers for initial data, first
order derivatives and second order derivatives (with the use of Qλd ,τd
for the derivatives).
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
Examples
Example 1: Classification with SVM
Predicted fat content class of peaces of meat from NIR spectra:
Method:
1 Split the data set into training sample and test sample;
2 Estimate λd by leave-one-out on the Xτd
i
of the training sample. The
Sobolev space is the one with boundary conditions
h(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2).
3 Calculate Qλd ,τd
.
4 Calculate Gaussian and Linear SVM classifiers for initial data, first
order derivatives and second order derivatives (with the use of Qλd ,τd
for the derivatives).
5 Calculate the pourcentage of false classifications made by all these
classifiers for the data in test sample.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
Examples
Example 1: Classification with SVM
Predicted fat content class of peaces of meat from NIR spectra:
Kernel (SVM) Error rate on test (and sd)
Linear (L) 3.78 % (2.52)
Linear on derivatives (L(1)) 2.97 % (1.93)
Linear on second derivatives (L(2)) 3,12 % (1,71)
Gaussian (G) 5.97 % (2.76)
Gaussian on derivatives (G(1)) 3.69 % (2.03)
Gaussian on second derivatives (G(2)) 2.77 % (2.07)
with significative differences (Wilcoxon paired test with level 1%) between
G(2) and G(1) and between G(1) and G.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
Examples
Example 2: Regression with kernel ridge regression
Predicting yellow berry on durum wheat: Recall that kernel ridge
regression is given by solving
arg min
w
n
i=1
(Ti − w(Ui))2
+ w 2
S
where S is a RKHS where w takes its values and defined from a given
kernel (such as the Gaussian kernel) and (Ui, Ti)i is a training sample in
Rp
× R.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 24 / 25
Examples
Example 2: Regression with kernel ridge regression
Predicting yellow berry on durum wheat:
Kernel (SVM) MSE on test (and sd ×10−3
)
Linear (L) 0.122 % (8.77)
Linear on derivatives (L(1)
) 0.138 % (9.53)
Linear on second derivatives (L(2)
) 0.122 % (1.71)
Gaussian (G) 0.110 % (20.2)
Gaussian on derivatives (G(1)
) 0.098 % (7.92)
Gaussian on second derivatives (G(2)
) 0.094 % (8.35)
The differences are significative between G(2) / G(1) and between G(1) / G.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 24 / 25
Conclusion and prospect
Conclusion
We presented a method that justify the common use of derivatives as
predictors and garanties no dramatic loss of information.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25
Conclusion and prospect
Conclusion
We presented a method that justify the common use of derivatives as
predictors and garanties no dramatic loss of information.
Prospects
1 Including errors in the sampling of X (related to errors-in-variables
models);
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25
Conclusion and prospect
Conclusion
We presented a method that justify the common use of derivatives as
predictors and garanties no dramatic loss of information.
Prospects
1 Including errors in the sampling of X (related to errors-in-variables
models);
2 Adding regularity assumptions on E(Y|X = .) to obtain a convergence
rate and relate d to n.
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25
Conclusion and prospect
Some references
Berlinet, A. and Thomas-Agnan, C. (2004).
Reproducing Kernel Hilbert Spaces in Probability and Statistics.
Kluwer Academic Publisher.
Faragó, T. and Györfi, L. (1975).
On the continuity of the error distortion function for multiple-hypothesis decisions.
IEEE Transactions on Information Theory, 21(4):458–460.
Kimeldorf, G. and Wahba, G. (1971).
Some results on Tchebycheffian spline functions.
Journal of Mathematical Analysis and Applications, 33(1):82–95.
Ragozin, D. (1983).
Error bounds for derivative estimation based on spline smoothing of exact or noisy
data.
Journal of Approximation Theory, 37:335–355.
Steinwart, I. (2002).
Support vector machines are universally consistent.
Journal of Complexity, 18:768–791.
Thank you for your attention !
TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25

Contenu connexe

Tendances

NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting treeDong Guo
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsNAVER Engineering
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyFrank Nielsen
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexityFrank Nielsen
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon informationFrank Nielsen
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsFrank Nielsen
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDValentin De Bortoli
 

Tendances (20)

NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropy
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexity
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon information
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGD
 

Similaire à Classification and regression based on derivatives: a consistency result for sampled functions

Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional datatuxette
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsVjekoslavKovac1
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Influence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data AnalysisInfluence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data Analysistuxette
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
Learning to Reconstruct at Stanford
Learning to Reconstruct at StanfordLearning to Reconstruct at Stanford
Learning to Reconstruct at StanfordJonas Adler
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionXin-She Yang
 
Optimization introduction
Optimization introductionOptimization introduction
Optimization introductionhelalmohammad2
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimizationhelalmohammad2
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIRtuxette
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIRtuxette
 
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Geoffrey Négiar
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 

Similaire à Classification and regression based on derivatives: a consistency result for sampled functions (20)

Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional data
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurations
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Influence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data AnalysisInfluence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data Analysis
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Learning to Reconstruct at Stanford
Learning to Reconstruct at StanfordLearning to Reconstruct at Stanford
Learning to Reconstruct at Stanford
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An Introduction
 
Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...
Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...
Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...
 
Optimization introduction
Optimization introductionOptimization introduction
Optimization introduction
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
smtlecture.7
smtlecture.7smtlecture.7
smtlecture.7
 
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
i2ml3e-chap2.pptx
i2ml3e-chap2.pptxi2ml3e-chap2.pptx
i2ml3e-chap2.pptx
 
I2ml3e chap2
I2ml3e chap2I2ml3e chap2
I2ml3e chap2
 

Plus de tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 

Plus de tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Dernier

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 

Dernier (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 

Classification and regression based on derivatives: a consistency result for sampled functions

  • 1. Classification and regression based on derivatives: a consistency result for sampled functions Nathalie Villa-Vialaneix http://www.nathalievilla.org Toulouse School of Economics TSE Seminar April 29, 2009 TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 1 / 25
  • 2. Research interests Graph mining (& social networks) Modelling land use evolution Functional data analysis TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 2 / 25
  • 3. Research interests Graph mining (& social networks) Modelling land use evolution Functional data analysis TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 2 / 25
  • 4. Research interests Graph mining (& social networks) Modelling land use evolution Functional data analysis Based on a preprint (available on demand) written with Fabrice Rossi. A short version is given on TSE web site. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 2 / 25
  • 5. Outline 1 Introduction and motivations 2 A general consistency result 3 Examples 4 Conclusion and prospect TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 3 / 25
  • 6. Introduction and motivations A regression or classification where the predictor takes values in an infinite dimensional space Settings (X, Y) is a random pair of variables where • Y ∈ {−1, 1} (binary classification problem) or Y ∈ R TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25
  • 7. Introduction and motivations A regression or classification where the predictor takes values in an infinite dimensional space Settings (X, Y) is a random pair of variables where • Y ∈ {−1, 1} (binary classification problem) or Y ∈ R • X ∈ (X, ., . X), an infinite dimensional Hilbert space. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25
  • 8. Introduction and motivations A regression or classification where the predictor takes values in an infinite dimensional space Settings (X, Y) is a random pair of variables where • Y ∈ {−1, 1} (binary classification problem) or Y ∈ R • X ∈ (X, ., . X), an infinite dimensional Hilbert space. We are given a learning set Sn = {(Xi, Yi)}n i=1 of n i.i.d. copies of (X, Y). TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25
  • 9. Introduction and motivations A regression or classification where the predictor takes values in an infinite dimensional space Settings (X, Y) is a random pair of variables where • Y ∈ {−1, 1} (binary classification problem) or Y ∈ R • X ∈ (X, ., . X), an infinite dimensional Hilbert space. We are given a learning set Sn = {(Xi, Yi)}n i=1 of n i.i.d. copies of (X, Y). Purpose: Find φn : X → {−1, 1} or R, that is universally consistent: • Classification case: limn→+∞ P (φn(X) Y) = L∗ where L∗ = infφ:X→{−1,1} P (φ(X) Y) is the Bayes risk. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25
  • 10. Introduction and motivations A regression or classification where the predictor takes values in an infinite dimensional space Settings (X, Y) is a random pair of variables where • Y ∈ {−1, 1} (binary classification problem) or Y ∈ R • X ∈ (X, ., . X), an infinite dimensional Hilbert space. We are given a learning set Sn = {(Xi, Yi)}n i=1 of n i.i.d. copies of (X, Y). Purpose: Find φn : X → {−1, 1} or R, that is universally consistent: • Classification case: limn→+∞ P (φn(X) Y) = L∗ where L∗ = infφ:X→{−1,1} P (φ(X) Y) is the Bayes risk. • Regression case: limn→+∞ E [φn(X) − Y]2 = L∗ where L∗ = infφ:X→R E [φ(X) − Y]2 will also be called the Bayes risk. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25
  • 11. Introduction and motivations An example Predicting the rate of yellow berry in durum wheat from its NIR spectrum. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 5 / 25
  • 12. Introduction and motivations Using derivatives Practically, X(m) is often more relevant than X for the prediction. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 6 / 25
  • 13. Introduction and motivations Using derivatives Practically, X(m) is often more relevant than X for the prediction. But X → X(m) induces information loss and inf φ:DmX→{−1,1} P φ(X(m) ) Y ≥ inf φ:X→{−1,1} P (φ(X) Y) = L∗ and inf φ:DmX→R E φ(X(m) ) − Y 2 ≥ inf φ:X→R P [φ(X) − Y]2 = L∗ . TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 6 / 25
  • 14. Introduction and motivations Sampled functions Practically, (Xi)i are not perfectly known but we are given a discrete sampling of them: Xτd i = (Xi(t))t∈τd where τd = {tτd 1 , . . . , tτd |τd | }. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 7 / 25
  • 15. Introduction and motivations Sampled functions Practically, (Xi)i are not perfectly known but we are given a discrete sampling of them: Xτd i = (Xi(t))t∈τd where τd = {tτd 1 , . . . , tτd |τd | }. The sampling can be noisy. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 7 / 25
  • 16. Introduction and motivations Sampled functions Practically, (Xi)i are not perfectly known but we are given a discrete sampling of them: Xτd i = (Xi(t))t∈τd where τd = {tτd 1 , . . . , tτd |τd | }. Then, X (m) i is estimated from Xτd i and if the estimation is denoted by X (m) τd , this also induces information loss: inf φ:DmX→{−1,1} P φ(X (m) τd ) Y ≥ inf φ:DmX→{−1,1} P φ(X(m) ) Y ≥ L∗ and inf φ:DmX→R E φ(X (m) τd ) − Y 2 ≥ inf φ:DmX→R E φ(X(m) ) − Y 2 ≥ L∗ . TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 7 / 25
  • 17. Introduction and motivations Purpose of the presentation Find a classifier or a regression function φn,τd built from X (m) τd such that the risk of φn,τd asymptotically reaches the Bayes risk L∗: lim |τd |→+∞ lim n→+∞ P φn,τd (X (m) τd ) Y = L∗ or lim |τd |→+∞ lim n→+∞ E φn,τd (X (m) τd ) − Y 2 = L∗ TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 8 / 25
  • 18. Introduction and motivations Purpose of the presentation Find a classifier or a regression function φn,τd built from X (m) τd such that the risk of φn,τd asymptotically reaches the Bayes risk L∗: lim |τd |→+∞ lim n→+∞ P φn,τd (X (m) τd ) Y = L∗ or lim |τd |→+∞ lim n→+∞ E φn,τd (X (m) τd ) − Y 2 = L∗ Main idea: Use a relevant way to estimate X(m) from Xτd (by smoothing splines) and combine the consistency of splines with the consistency of a R|τd |-classifier or regression function. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 8 / 25
  • 19. A general consistency result Basics about smoothing splines I Suppose that X is the Sobolev space Hm = h ∈ L2 [0,1]|∀ j = 1, . . . , m, Dj h exists in the weak sense and Dm h ∈ L2 TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 9 / 25
  • 20. A general consistency result Basics about smoothing splines I Suppose that X is the Sobolev space Hm = h ∈ L2 [0,1]|∀ j = 1, . . . , m, Dj h exists in the weak sense and Dm h ∈ L2 equipped with the scalar product u, v Hm = Dm u, Dm v L2 + m j=1 Bj uBj v where B are m boundary conditions such that KerB ∩ Pm−1 = {0}. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 9 / 25
  • 21. A general consistency result Basics about smoothing splines I Suppose that X is the Sobolev space Hm = h ∈ L2 [0,1]|∀ j = 1, . . . , m, Dj h exists in the weak sense and Dm h ∈ L2 equipped with the scalar product u, v Hm = Dm u, Dm v L2 + m j=1 Bj uBj v where B are m boundary conditions such that KerB ∩ Pm−1 = {0}. (Hm , ., . Hm ) is a RKHS: there is k0 : Pm−1 × Pm−1 → R and k1 : KerB × KerB → R such that ∀ u ∈ Pm−1 , t ∈ [0, 1], u, k0(t, .) Hm = u(t) and ∀ u ∈ KerB, t ∈ [0, 1], u, k1(t, .) Hm = u(t) See [Berlinet and Thomas-Agnan, 2004] for further details. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 9 / 25
  • 22. A general consistency result Basics about smoothing splines II A simple example of boundary conditions: h(0) = h(1) (0) = . . . = h(m−1) (0) = 0. Then, k0(s, t) = m−1 k=0 tk sk (k!)2 and k1(s, t) = 1 0 (t − w)m−1 + (s − w)m−1 + (m − 1)! dw. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 10 / 25
  • 23. A general consistency result Estimating the predictors with smoothing splines I Assumption (A1) • |τd| ≥ m − 1 • sampling points are distinct in [0, 1] • Bj are linearly independent from h → h(t) for all t ∈ τd TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 11 / 25
  • 24. A general consistency result Estimating the predictors with smoothing splines I Assumption (A1) • |τd| ≥ m − 1 • sampling points are distinct in [0, 1] • Bj are linearly independent from h → h(t) for all t ∈ τd [Kimeldorf and Wahba, 1971]: for xτd in R|τd |, there is a unique ˆxλ,τd ∈ Hm solution of arg min h∈Hm 1 |τd| |τd | l)1 (h(tl) − xτd )2 + λ [0,1] (h(m) (t))2 dt. and ˆxλ,τd = Sλ,τd xτd where Sλ,τd is a full rank operator from R|τd | to Hm . TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 11 / 25
  • 25. A general consistency result Estimating the predictors with smoothing splines I Assumption (A1) • |τd| ≥ m − 1 • sampling points are distinct in [0, 1] • Bj are linearly independent from h → h(t) for all t ∈ τd [Kimeldorf and Wahba, 1971]: for xτd in R|τd |, there is a unique ˆxλ,τd ∈ Hm solution of arg min h∈Hm 1 |τd| |τd | l)1 (h(tl) − xτd )2 + λ [0,1] (h(m) (t))2 dt. and ˆxλ,τd = Sλ,τd xτd where Sλ,τd is a full rank operator from R|τd | to Hm . These assumptions are fullfilled by the previous simple example as long as 0 τd. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 11 / 25
  • 26. A general consistency result Estimating the predictors with smoothing splines II S, τ is given by: Sλ,τd = ωT (U(K1 + λI|τd |)UT )−1 U(K1 + λI|τd |)−1 +ηT (K1 + λI|τd |)−1 (I|τd | − UT (U(K1 + λI|τd |)−1 U(K1 + λI|τd |)−1 ) = ωT M0 + ηT M1 with • {ω1, . . . , ωm} is a basis of Pm−1 , ω = (ω1, . . . , ωm)T and U = (ωi(t))i=1,...,m t∈τd ; • η = (k1(t, .))T t∈τd and K1 = (k1(t, t ))t,t ∈τd . TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 12 / 25
  • 27. A general consistency result Estimating the predictors with smoothing splines II S, τ is given by: Sλ,τd = ωT (U(K1 + λI|τd |)UT )−1 U(K1 + λI|τd |)−1 +ηT (K1 + λI|τd |)−1 (I|τd | − UT (U(K1 + λI|τd |)−1 U(K1 + λI|τd |)−1 ) = ωT M0 + ηT M1 with • {ω1, . . . , ωm} is a basis of Pm−1 , ω = (ω1, . . . , ωm)T and U = (ωi(t))i=1,...,m t∈τd ; • η = (k1(t, .))T t∈τd and K1 = (k1(t, t ))t,t ∈τd . The observations of the predictor X (NIR spectra) are then estimated from their sampling Xτd by Xλ,τd . TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 12 / 25
  • 28. A general consistency result Two important consequences 1 No information loss inf φ:Hm→{−1,1} P φ(Xλ,τd ) Y = inf φ:R|τd |→{−1,1} P (φ(Xτd ) Y) and inf φ:Hm→{−1,1} E φ(Xλ,τd ) − Y 2 = inf φ:R|τd |→{−1,1} P [φ(Xτd ) − Y]2 TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
  • 29. A general consistency result Two important consequences 1 No information loss inf φ:Hm→{−1,1} P φ(Xλ,τd ) Y = inf φ:R|τd |→{−1,1} P (φ(Xτd ) Y) and inf φ:Hm→{−1,1} E φ(Xλ,τd ) − Y 2 = inf φ:R|τd |→{−1,1} P [φ(Xτd ) − Y]2 2 Easy way to use derivatives: Sλ,τd uτd , Sλ,τd vτd Hm = uλ,τd , vλ,τd Hm TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
  • 30. A general consistency result Two important consequences 1 No information loss inf φ:Hm→{−1,1} P φ(Xλ,τd ) Y = inf φ:R|τd |→{−1,1} P (φ(Xτd ) Y) and inf φ:Hm→{−1,1} E φ(Xλ,τd ) − Y 2 = inf φ:R|τd |→{−1,1} P [φ(Xτd ) − Y]2 2 Easy way to use derivatives: (uτd )T MT 0 WM0vτd + (uτd )T MT 1 K1M1vτd = uλ,τd , vλ,τd Hm where K1, M0 and M1 have been previously defined and W = ( ωi, ωj Hm )i,j=1,...,m. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
  • 31. A general consistency result Two important consequences 1 No information loss inf φ:Hm→{−1,1} P φ(Xλ,τd ) Y = inf φ:R|τd |→{−1,1} P (φ(Xτd ) Y) and inf φ:Hm→{−1,1} E φ(Xλ,τd ) − Y 2 = inf φ:R|τd |→{−1,1} P [φ(Xτd ) − Y]2 2 Easy way to use derivatives: (uτd )T Mλ,τd vτd = uλ,τd , vλ,τd Hm where Mλ,τd is symmetric, definite positive. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
  • 32. A general consistency result Two important consequences 1 No information loss inf φ:Hm→{−1,1} P φ(Xλ,τd ) Y = inf φ:R|τd |→{−1,1} P (φ(Xτd ) Y) and inf φ:Hm→{−1,1} E φ(Xλ,τd ) − Y 2 = inf φ:R|τd |→{−1,1} P [φ(Xτd ) − Y]2 2 Easy way to use derivatives: (Qλ,τd uτd )T (Qλ,τd vτd ) = uλ,τd , vλ,τd Hm where Qλ,τd is the Choleski triangle of Mλ,τd : QT λ,τd Qλ,τd = Mλ,τd . Remark: Qλ,τd is calculated only from the RKHS, λ and τd: it does not depend on the data set. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
  • 33. A general consistency result Two important consequences 1 No information loss inf φ:Hm→{−1,1} P φ(Xλ,τd ) Y = inf φ:R|τd |→{−1,1} P (φ(Xτd ) Y) and inf φ:Hm→{−1,1} E φ(Xλ,τd ) − Y 2 = inf φ:R|τd |→{−1,1} P [φ(Xτd ) − Y]2 2 Easy way to use derivatives: (Qλ,τd uτd )T (Qλ,τd vτd ) = uλ,τd , vλ,τd Hm u (m) λ,τd , v (m) λ,τd L2 where Qλ,τd is the Choleski triangle of Mλ,τd : QT λ,τd Qλ,τd = Mλ,τd . Remark: Qλ,τd is calculated only from the RKHS, λ and τd: it does not depend on the data set. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25
  • 34. A general consistency result Classification and regression based on derivatives Suppose that we know a consistent classifier or regression function in R|τd | that is based on R|τd | scalar product or norm. Example: Nonparametric kernel regression Ψ : u ∈ R|τd | → n i=1 TiK u−Ui R|τd | hn n i=1 K u−Ui R|τd | hn where (Ui, Ti)i=1,...,n is a learning set in R|τd | × R. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25
  • 35. A general consistency result Classification and regression based on derivatives Suppose that we know a consistent classifier or regression function in R|τd | that is based on R|τd | scalar product or norm. The corresponding derivative based classifier or regression function is given by using the norm induced by Qλ,τd : Example: Nonparametric kernel regression Ψ : u ∈ R|τd | → n i=1 TiK u−Ui R|τd | hn n i=1 K u−Ui R|τd | hn where (Ui, Ti)i=1,...,n is a learning set in R|τd | × R. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25
  • 36. A general consistency result Classification and regression based on derivatives Suppose that we know a consistent classifier or regression function in R|τd | that is based on R|τd | scalar product or norm. The corresponding derivative based classifier or regression function is given by using the norm induced by Qλ,τd : Example: Nonparametric kernel regression φn,d = Ψ ◦ Qλ,τd : x ∈ Hm → n i=1 YiK Qλ,τd xτd −Qλ,τd X τd i R|τd | hn n i=1 K Qλ,τd xτd −Qλ,τd X τd i R|τd | hn TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25
  • 37. A general consistency result Classification and regression based on derivatives Suppose that we know a consistent classifier or regression function in R|τd | that is based on R|τd | scalar product or norm. The corresponding derivative based classifier or regression function is given by using the norm induced by Qλ,τd : Example: Nonparametric kernel regression φn,d = Ψ ◦ Qλ,τd : x ∈ Hm → n i=1 YiK Qλ,τd xτd −Qλ,τd X τd i R|τd | hn n i=1 K Qλ,τd xτd −Qλ,τd X τd i R|τd | hn −→ n i=1 YiK x(m)−X (m) i L2 hn n i=1 K x(m)−X (m) i L2 hn TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25
  • 38. A general consistency result Remark for consistency Classification case (approximatively the same is true for regression case): P φn,τd (Xλ,τd ) Y − L∗ = P φn,τd (Xλ,τd ) Y − L∗ d + L∗ d − L∗ where L∗ d = infφ:R|τd |→{−1,1} P (φ(Xτd ) Y). TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25
  • 39. A general consistency result Remark for consistency Classification case (approximatively the same is true for regression case): P φn,τd (Xλ,τd ) Y − L∗ = P φn,τd (Xλ,τd ) Y − L∗ d + L∗ d − L∗ where L∗ d = infφ:R|τd |→{−1,1} P (φ(Xτd ) Y). 1 For all fixed d, lim n→+∞ P φn,τd (Xλ,τd ) Y = L∗ d as long as the R|τd |-classifier is consistent because there is a one-to-one mapping between Xτd and Xλ,τd . TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25
  • 40. A general consistency result Remark for consistency Classification case (approximatively the same is true for regression case): P φn,τd (Xλ,τd ) Y − L∗ = P φn,τd (Xλ,τd ) Y − L∗ d + L∗ d − L∗ where L∗ d = infφ:R|τd |→{−1,1} P (φ(Xτd ) Y). 1 For all fixed d, lim n→+∞ P φn,τd (Xλ,τd ) Y = L∗ d as long as the R|τd |-classifier is consistent because there is a one-to-one mapping between Xτd and Xλ,τd . 2 L∗ d − L∗ ≤ E E(Y|Xλ,τd ) − E(Y|X) with consistency of spline estimate Xλ,τd and assumption on the regularity of E(Y|X = .), consistency would be proved. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25
  • 41. A general consistency result Remark for consistency Classification case (approximatively the same is true for regression case): P φn,τd (Xλ,τd ) Y − L∗ = P φn,τd (Xλ,τd ) Y − L∗ d + L∗ d − L∗ where L∗ d = infφ:R|τd |→{−1,1} P (φ(Xτd ) Y). 1 For all fixed d, lim n→+∞ P φn,τd (Xλ,τd ) Y = L∗ d as long as the R|τd |-classifier is consistent because there is a one-to-one mapping between Xτd and Xλ,τd . 2 L∗ d − L∗ ≤ E E(Y|Xλ,τd ) − E(Y|X) with consistency of spline estimate Xλ,τd and assumption on the regularity of E(Y|X = .), consistency would be proved. But continuity of E(Y|X = .) is a strong assumption in infinite dimensional case, and is not easy to check. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25
  • 42. A general consistency result Spline consistency Let λ depends on d and denote (λd)d the series of regularization parameters. Also introduce ∆τd := max{t1, t2 − t1, . . . , 1 − t|τd |}, ∆τd := min 1≤i<|τd | {ti+1 − ti} Assumption (A2) • There is R such that ∆τd /∆τd ≤ R for all d; • limd→+∞ |τd| = +∞; • limd→+∞ λd = 0. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 16 / 25
  • 43. A general consistency result Spline consistency Let λ depends on d and denote (λd)d the series of regularization parameters. Also introduce ∆τd := max{t1, t2 − t1, . . . , 1 − t|τd |}, ∆τd := min 1≤i<|τd | {ti+1 − ti} Assumption (A2) • There is R such that ∆τd /∆τd ≤ R for all d; • limd→+∞ |τd| = +∞; • limd→+∞ λd = 0. [Ragozin, 1983]: Under (A1) and (A2), ∃AR,m and BR,m such that for any x ∈ Hm and any λd > 0, ˆxλd ,τd − x 2 L2 ≤ AR,mλd + BR,m 1 |τd|2m Dm x 2 L2 d→+∞ −−−−−−→ 0 TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 16 / 25
  • 44. A general consistency result Bayes risk consistency Assumption (A3a) E Dm X 2 L2 is finite and Y ∈ {−1, 1}. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 17 / 25
  • 45. A general consistency result Bayes risk consistency Assumption (A3a) E Dm X 2 L2 is finite and Y ∈ {−1, 1}. or Assumption (A3b) τd ⊂ τd+1 for all d and E(Y2 ) is finite. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 17 / 25
  • 46. A general consistency result Bayes risk consistency Assumption (A3a) E Dm X 2 L2 is finite and Y ∈ {−1, 1}. or Assumption (A3b) τd ⊂ τd+1 for all d and E(Y2 ) is finite. Under (A1)-(A3), limd→+∞ L∗ d = L∗. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 17 / 25
  • 47. A general consistency result Proof for assumption (A3a) Assumption (A3a) E Dm X 2 L2 is finite and Y ∈ {−1, 1}. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 18 / 25
  • 48. A general consistency result Proof for assumption (A3a) Assumption (A3a) E Dm X 2 L2 is finite and Y ∈ {−1, 1}. The proof is based on a result of [Faragó and Györfi, 1975]: For a pair of random variables (X, Y) taking their values in X×{−1, 1} where X is an arbitrary metric space and for a series of functions Td : X → X such that E(δ(Td(X), X)) d→+∞ −−−−−−→ 0 then limd→+∞ infφ:X→{−1,1} P(φ(Td(X)) Y) = L∗. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 18 / 25
  • 49. A general consistency result Proof for assumption (A3a) Assumption (A3a) E Dm X 2 L2 is finite and Y ∈ {−1, 1}. The proof is based on a result of [Faragó and Györfi, 1975]: For a pair of random variables (X, Y) taking their values in X×{−1, 1} where X is an arbitrary metric space and for a series of functions Td : X → X such that E(δ(Td(X), X)) d→+∞ −−−−−−→ 0 then limd→+∞ infφ:X→{−1,1} P(φ(Td(X)) Y) = L∗. Using for Td the spline estimate and the previous inequality of [Ragozin, 1983] about this estimates give the assumption. Then the result follows. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 18 / 25
  • 50. A general consistency result Proof for assumption (A3b) Assumption (A3b) τd ⊂ τd+1 for all d and E(Y2 ) is finite. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 19 / 25
  • 51. A general consistency result Proof for assumption (A3b) Assumption (A3b) τd ⊂ τd+1 for all d and E(Y2 ) is finite. Under (A3b), (E(Y|Xλd ,τd ))d is a uniformly bounded martingale and thus converge for the L1 -norm. Using the consistency of (Xλd ,τd )d to X completes the proof. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 19 / 25
  • 52. A general consistency result Concluding result Theorem Under assumptions (A1)-(A3), lim d→+∞ lim n→+∞ P φn,τd (Xλd ,τd ) Y = L∗ and lim |τd |→+∞ lim n→+∞ E φn,τd (Xλd ,τd ) − Y 2 = L∗ Proof: For a > 0, fix d0 such that, for all d ≥ d0, L∗ d − L∗ ≤ /2. Then, by consistency of the R|τd |-classifier or regression function, conclude. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 20 / 25
  • 53. A general consistency result A practical application to SVM I Recall that, for a learning set (Ui, Ti)i=1,...,n in Rp × {−1, 1}, gaussian SVM is the classifier u ∈ Rp → Sign   n i=1 αiTie−γ u−Ui 2 Rp   where (αi)i satisfy the following quadratic optimization problem: arg min w n i=1 1 − Tiw(Ui) + + C w 2 S where w(u) = n i=1 αie−γ u−Ui 2 Rp and S is the RKHS associated with the gaussian kernel and C is a regularization parameter. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 21 / 25
  • 54. A general consistency result A practical application to SVM I Recall that, for a learning set (Ui, Ti)i=1,...,n in Rp × {−1, 1}, gaussian SVM is the classifier u ∈ Rp → Sign   n i=1 αiTie−γ u−Ui 2 Rp   where (αi)i satisfy the following quadratic optimization problem: arg min w n i=1 1 − Tiw(Ui) + + C w 2 S where w(u) = n i=1 αie−γ u−Ui 2 Rp and S is the RKHS associated with the gaussian kernel and C is a regularization parameter. Under suitable assumptions, [Steinwart, 2002] proves the consistency of SVM classifiers. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 21 / 25
  • 55. A general consistency result A practical application to SVM II Additional assumptions related to SVM: Assumptions (A4) • For all d, the regularization parameter depends on n such that limn→+∞ nCd n = +∞ and Cd n = On nβd −1 for a 0 < βd < 1/d. • For all d, there is a bounded subset of R|τd |, Bd, such that Xτd belongs to Bd. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 22 / 25
  • 56. A general consistency result A practical application to SVM II Additional assumptions related to SVM: Assumptions (A4) • For all d, the regularization parameter depends on n such that limn→+∞ nCd n = +∞ and Cd n = On nβd −1 for a 0 < βd < 1/d. • For all d, there is a bounded subset of R|τd |, Bd, such that Xτd belongs to Bd. Result: Under assumptions (A1)-(A4), the SVM φn,d : x ∈ Hm → Sign   n i=1 αiYie −γ Qλd ,τd xτd −Qλd ,τd X τd i 2 Rd   Sign   n i=1 αiYie −γ x(m)−X (m) i |2 L2   is consistent: limd→+∞ limn→+∞ P φn,τd (Xλd ,τd ) Y = L∗. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 22 / 25
  • 57. Examples Example 1: Classification with SVM Predicted fat content class of peaces of meat from NIR spectra: TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
  • 58. Examples Example 1: Classification with SVM Predicted fat content class of peaces of meat from NIR spectra: Method: 1 Split the data set into training sample and test sample; TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
  • 59. Examples Example 1: Classification with SVM Predicted fat content class of peaces of meat from NIR spectra: Method: 1 Split the data set into training sample and test sample; 2 Estimate λd by leave-one-out on the Xτd i of the training sample. The Sobolev space is the one with boundary conditions h(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2). TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
  • 60. Examples Example 1: Classification with SVM Predicted fat content class of peaces of meat from NIR spectra: Method: 1 Split the data set into training sample and test sample; 2 Estimate λd by leave-one-out on the Xτd i of the training sample. The Sobolev space is the one with boundary conditions h(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2). 3 Calculate Qλd ,τd . TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
  • 61. Examples Example 1: Classification with SVM Predicted fat content class of peaces of meat from NIR spectra: Method: 1 Split the data set into training sample and test sample; 2 Estimate λd by leave-one-out on the Xτd i of the training sample. The Sobolev space is the one with boundary conditions h(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2). 3 Calculate Qλd ,τd . 4 Calculate Gaussian and Linear SVM classifiers for initial data, first order derivatives and second order derivatives (with the use of Qλd ,τd for the derivatives). TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
  • 62. Examples Example 1: Classification with SVM Predicted fat content class of peaces of meat from NIR spectra: Method: 1 Split the data set into training sample and test sample; 2 Estimate λd by leave-one-out on the Xτd i of the training sample. The Sobolev space is the one with boundary conditions h(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2). 3 Calculate Qλd ,τd . 4 Calculate Gaussian and Linear SVM classifiers for initial data, first order derivatives and second order derivatives (with the use of Qλd ,τd for the derivatives). 5 Calculate the pourcentage of false classifications made by all these classifiers for the data in test sample. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
  • 63. Examples Example 1: Classification with SVM Predicted fat content class of peaces of meat from NIR spectra: Kernel (SVM) Error rate on test (and sd) Linear (L) 3.78 % (2.52) Linear on derivatives (L(1)) 2.97 % (1.93) Linear on second derivatives (L(2)) 3,12 % (1,71) Gaussian (G) 5.97 % (2.76) Gaussian on derivatives (G(1)) 3.69 % (2.03) Gaussian on second derivatives (G(2)) 2.77 % (2.07) with significative differences (Wilcoxon paired test with level 1%) between G(2) and G(1) and between G(1) and G. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25
  • 64. Examples Example 2: Regression with kernel ridge regression Predicting yellow berry on durum wheat: Recall that kernel ridge regression is given by solving arg min w n i=1 (Ti − w(Ui))2 + w 2 S where S is a RKHS where w takes its values and defined from a given kernel (such as the Gaussian kernel) and (Ui, Ti)i is a training sample in Rp × R. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 24 / 25
  • 65. Examples Example 2: Regression with kernel ridge regression Predicting yellow berry on durum wheat: Kernel (SVM) MSE on test (and sd ×10−3 ) Linear (L) 0.122 % (8.77) Linear on derivatives (L(1) ) 0.138 % (9.53) Linear on second derivatives (L(2) ) 0.122 % (1.71) Gaussian (G) 0.110 % (20.2) Gaussian on derivatives (G(1) ) 0.098 % (7.92) Gaussian on second derivatives (G(2) ) 0.094 % (8.35) The differences are significative between G(2) / G(1) and between G(1) / G. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 24 / 25
  • 66. Conclusion and prospect Conclusion We presented a method that justify the common use of derivatives as predictors and garanties no dramatic loss of information. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25
  • 67. Conclusion and prospect Conclusion We presented a method that justify the common use of derivatives as predictors and garanties no dramatic loss of information. Prospects 1 Including errors in the sampling of X (related to errors-in-variables models); TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25
  • 68. Conclusion and prospect Conclusion We presented a method that justify the common use of derivatives as predictors and garanties no dramatic loss of information. Prospects 1 Including errors in the sampling of X (related to errors-in-variables models); 2 Adding regularity assumptions on E(Y|X = .) to obtain a convergence rate and relate d to n. TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25
  • 69. Conclusion and prospect Some references Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publisher. Faragó, T. and Györfi, L. (1975). On the continuity of the error distortion function for multiple-hypothesis decisions. IEEE Transactions on Information Theory, 21(4):458–460. Kimeldorf, G. and Wahba, G. (1971). Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications, 33(1):82–95. Ragozin, D. (1983). Error bounds for derivative estimation based on spline smoothing of exact or noisy data. Journal of Approximation Theory, 37:335–355. Steinwart, I. (2002). Support vector machines are universally consistent. Journal of Complexity, 18:768–791. Thank you for your attention ! TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25