SlideShare une entreprise Scribd logo
1  sur  10
Télécharger pour lire hors ligne
Arthur Charpentier, SIDE Summer School, July 2019
# 7 Classification & Goodness of Fit (Theoretical)
Arthur Charpentier (Universit´e du Qu´ebec `a Montr´eal)
Machine Learning & Econometrics
SIDE Summer School - July 2019
@freakonometrics freakonometrics freakonometrics.hypotheses.org 1
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, in a Probabilistic Framework
Assume that training and validation data are drawn i.i.d. from P, or (Y, X) ∼ F
Consider y ∈ {−1, +1}. The true risk of a classifier is
R(m) = P(Y,X)∼F
m(X) = Y ) = E(Y,X)∼F
(m(X), Y )
Bayes classifier is
b(x) = sign E(Y,X)∼F
Y |X = x
which satisfies R(b) = inf
m∈H
{R(m)} (in the class H of all measurable functions),
called Bayes risk.
The empirical risk is
Rn(m) =
1
n
n
i=1
(yi, m(xi)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 2
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, in a Probabilistic Framework
One might think of using regularized empirical risk minimization,
mn ∈ argmin
m∈M
Rn(m) + λ m
in a class of models M, where regularization term will control the complexity of
the model to prevent overfitting.
Let m denote the best model in M, m = argmin
m∈M
{R(m)}
R(mn) − R(b) = R(mn) − R(m )
estimation error
+ (R(m ) − R(b)
approximation error
Since R(mn) = Rn(mn)) + [R(mn) − Rn(mn))], we can write
R(mn) ≤ Rn(mn)) + something(m, F)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 3
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, in a Probabilistic Framework
To quantify this something(m, F), we need Hoeffding inequality, see Hoeffding
(1963, Probability inequalities for sums of bounded random variables)
Let g(x, y) = (m(x), y), for some model m. Let
G = {g : (x, y) → (m(x), y), m ∈ M}
If Z = (Y, X), set R(g) = EZ∼F
g(Z) and Rn(g) =
1
n
n
i=1
g(zi).
Hoeffding inequality
If Z1, · · · , Zn are i.i.d. and if h is a bounded function (in [a, b]), then,
∀ > 0
Pn
1
n
n
i=1
h(Zi) − EF [h(Z)] ≥ ≤ 2 exp
−2n 2
(b − a)2
@freakonometrics freakonometrics freakonometrics.hypotheses.org 4
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, in a Probabilistic Framework
or equivalently (let δ denote the upper bound)
Pn
1
n
n
i=1
h(Zi) − EF [h(Z)] ≥ (b − a)
−1
2n
log(2δ) ≤ δ
We can actually derive a one side majoration, and with probability (at least) 1 − δ
R(g) ≤ Rn(g) +
−1
2n
log δ
R(m) − Rn(m)
For a fixed m ∈ M, R(m) − Rn(m) ∼
1
√
n
But it doesn’t help much, we need uniform deviations (or worst deviation).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 5
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, in a Probabilistic Framework
Consider a finite set of models. Define the set of bad samples
Zj = (z1, · · · , zn) : R(gj) ≤ Rn(gj) ≥ 0
so that P[(Z1, · · · , Zn) ∈ Zj] ≤ δ, and then
P[(Z1, · · · , Zn) ∈ Z1 ∩ Z1] ≤ P[(Z1, · · · , Zn) ∈ Z1] + P[(Z1, · · · , Zn) ∈ Z2] ≤ 2δ
so that
P

(Z1, · · · , Zn) ∈
ν
j=1
Zj

 ≤
ν
j=1
P[(Z1, · · · , Zn) ∈ Zj] ≤ νδ
Hence,
P[∃g ∈ {g1, · · · , gν} : R(g) − Rn(g) ≥ ] ≤ ν · P[R(g) − Rn(g) ≥ ] ≤ ν · exp[−2n 2
]
@freakonometrics freakonometrics freakonometrics.hypotheses.org 6
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, in a Probabilistic Framework
If δ = ν exp[−2n 2
], we can write and with probability (at least) 1 − δ
∀g ∈ {g1, · · · , gν}, R(g) − Rn(g) ≤
1
n
(log ν − log δ)
Thus, we can write, for a finite set of models M = {m1, · · · , mν},
∀m ∈ {m1, · · · , mν}, R(m) ≤ Rn(m) +
1
n
(log ν − log δ)
R(m) − Rn(m) - M finite, ν = |M|
For the worst case scenario
sup
m∈Mν
R(m) − Rn(m) ∼
log ν
√
n
Now, what if M is infinite ?
@freakonometrics freakonometrics freakonometrics.hypotheses.org 7
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, in a Probabilistic Framework
Write Hoeffding’s inequality as
P R(g) − Rn(g) ≥
−1
2n
log δg ≤ δg
so that, we a countable set G
P ∃g ∈ G : R(g) − Rn(g) ≥
−1
2n
log δg ≤
g∈G
δg
If δg = δ · µ(g) where µ is some measure on G, with probability (at least) 1 − δ,
∀g ∈ G, R(g) ≤ Rn(g) +
−1
2n
[log δ + log µ(g)]
(see previous computations with µ(g) = ν−1
)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 8
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, in a Probabilistic Framework
More generally, given a sample z = {z1, · · · , zn}, let Mz denote the set of
classification that can be obtained,
Mz = (m(z1), · · · , m(zn))
The growth function is the maximum number of ways into which n points can be
classified by the function class M
GM
(n) = sup
z
Mz
Vapnik-Chervonenkis : with (at least) probability 1 − δ,
∀m ∈ M, R(m) ≤ Rn(m) + 2
2
n
[log GM
(2n) − log(4δ)]
The VC (Vapnik-Chervonenkis) dimension is the largest n such that
GM
(n) = 2n
. It will be denoted VC(M). Observe that GM
(n) ≤ 2n
@freakonometrics freakonometrics freakonometrics.hypotheses.org 9
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, in a Probabilistic Framework
n ≤ VC(M) : n → GM
(n) increases exponentially GM
(n) = 2n
n ≥ VC(M) : n → GM
(n) increases at power speed GM
(n) ≤
en
VC(M)
VC(M)
Vapnik-Chervonenkis : with (at least) probability 1 − δ,
∀m ∈ M, R(m) ≤ Rn(m) + 2
2
n
[VC(M) log
en
VC(M)
− log(4δ)]
R(m) − Rn(m) - M infinite
For the worst case scenario sup
m∈M
R(m) − Rn(m) ∼
VC(M) · log n
n
To go further, see Bousquet, Boucheron & Lugosi (2005, Introduction to Learning
Theory)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 10

Contenu connexe

Tendances

Tendances (20)

Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Side 2019 #6
Side 2019 #6Side 2019 #6
Side 2019 #6
 
Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2
 
Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
 
Side 2019 #3
Side 2019 #3Side 2019 #3
Side 2019 #3
 
Econometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 NonlinearitiesEconometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 Nonlinearities
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive Modeling
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Slides lln-risques
Slides lln-risquesSlides lln-risques
Slides lln-risques
 
Slides univ-van-amsterdam
Slides univ-van-amsterdamSlides univ-van-amsterdam
Slides univ-van-amsterdam
 
Slides erm-cea-ia
Slides erm-cea-iaSlides erm-cea-ia
Slides erm-cea-ia
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
 
Slides econ-lm
Slides econ-lmSlides econ-lm
Slides econ-lm
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 
Slides networks-2017-2
Slides networks-2017-2Slides networks-2017-2
Slides networks-2017-2
 
Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017
 

Similaire à Side 2019 #7

Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
Caleb (Shiqiang) Jin
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
Per Kristian Lehre
 

Similaire à Side 2019 #7 (20)

SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSSOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
 
sada_pres
sada_pressada_pres
sada_pres
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)
 
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
Complexity Classes and the Graph Isomorphism Problem
Complexity Classes and the Graph Isomorphism ProblemComplexity Classes and the Graph Isomorphism Problem
Complexity Classes and the Graph Isomorphism Problem
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Best Approximation in Real Linear 2-Normed Spaces
Best Approximation in Real Linear 2-Normed SpacesBest Approximation in Real Linear 2-Normed Spaces
Best Approximation in Real Linear 2-Normed Spaces
 
Slides_A4.pdf
Slides_A4.pdfSlides_A4.pdf
Slides_A4.pdf
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
Vancouver18
Vancouver18Vancouver18
Vancouver18
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract means
 
Pseudo Bipolar Fuzzy Cosets of Bipolar Fuzzy and Bipolar Anti-Fuzzy HX Subgroups
Pseudo Bipolar Fuzzy Cosets of Bipolar Fuzzy and Bipolar Anti-Fuzzy HX SubgroupsPseudo Bipolar Fuzzy Cosets of Bipolar Fuzzy and Bipolar Anti-Fuzzy HX Subgroups
Pseudo Bipolar Fuzzy Cosets of Bipolar Fuzzy and Bipolar Anti-Fuzzy HX Subgroups
 
Cs229 notes7b
Cs229 notes7bCs229 notes7b
Cs229 notes7b
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 

Plus de Arthur Charpentier

Plus de Arthur Charpentier (15)

Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
ACT6100 introduction
ACT6100 introductionACT6100 introduction
ACT6100 introduction
 
Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)
 
Control epidemics
Control epidemics Control epidemics
Control epidemics
 
STT5100 Automne 2020, introduction
STT5100 Automne 2020, introductionSTT5100 Automne 2020, introduction
STT5100 Automne 2020, introduction
 
Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
 
Side 2019 #8
Side 2019 #8Side 2019 #8
Side 2019 #8
 
Pareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQPareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQ
 
Econ. Seminar Uqam
Econ. Seminar UqamEcon. Seminar Uqam
Econ. Seminar Uqam
 
Mutualisation et Segmentation
Mutualisation et SegmentationMutualisation et Segmentation
Mutualisation et Segmentation
 

Dernier

MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdfMASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
Cocity Enterprises
 

Dernier (20)

Business Principles, Tools, and Techniques in Participating in Various Types...
Business Principles, Tools, and Techniques  in Participating in Various Types...Business Principles, Tools, and Techniques  in Participating in Various Types...
Business Principles, Tools, and Techniques in Participating in Various Types...
 
Webinar on E-Invoicing for Fintech Belgium
Webinar on E-Invoicing for Fintech BelgiumWebinar on E-Invoicing for Fintech Belgium
Webinar on E-Invoicing for Fintech Belgium
 
✂️ 👅 Independent Bhubaneswar Escorts Odisha Call Girls With Room Bhubaneswar ...
✂️ 👅 Independent Bhubaneswar Escorts Odisha Call Girls With Room Bhubaneswar ...✂️ 👅 Independent Bhubaneswar Escorts Odisha Call Girls With Room Bhubaneswar ...
✂️ 👅 Independent Bhubaneswar Escorts Odisha Call Girls With Room Bhubaneswar ...
 
Escorts Indore Call Girls-9155612368-Vijay Nagar Decent Fantastic Call Girls ...
Escorts Indore Call Girls-9155612368-Vijay Nagar Decent Fantastic Call Girls ...Escorts Indore Call Girls-9155612368-Vijay Nagar Decent Fantastic Call Girls ...
Escorts Indore Call Girls-9155612368-Vijay Nagar Decent Fantastic Call Girls ...
 
Certified Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil ba...
Certified Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil ba...Certified Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil ba...
Certified Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil ba...
 
Vip Call Girls Rasulgada😉 Bhubaneswar 9777949614 Housewife Call Girls Servic...
Vip Call Girls Rasulgada😉  Bhubaneswar 9777949614 Housewife Call Girls Servic...Vip Call Girls Rasulgada😉  Bhubaneswar 9777949614 Housewife Call Girls Servic...
Vip Call Girls Rasulgada😉 Bhubaneswar 9777949614 Housewife Call Girls Servic...
 
Dubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai Multiple
Dubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai MultipleDubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai Multiple
Dubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai Multiple
 
Mahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Mahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsMahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Mahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
Seeman_Fiintouch_LLP_Newsletter_May-2024.pdf
Seeman_Fiintouch_LLP_Newsletter_May-2024.pdfSeeman_Fiintouch_LLP_Newsletter_May-2024.pdf
Seeman_Fiintouch_LLP_Newsletter_May-2024.pdf
 
logistics industry development power point ppt.pdf
logistics industry development power point ppt.pdflogistics industry development power point ppt.pdf
logistics industry development power point ppt.pdf
 
Pension dashboards forum 1 May 2024 (1).pdf
Pension dashboards forum 1 May 2024 (1).pdfPension dashboards forum 1 May 2024 (1).pdf
Pension dashboards forum 1 May 2024 (1).pdf
 
Bhubaneswar🌹Ravi Tailkes ❤CALL GIRLS 9777949614 💟 CALL GIRLS IN bhubaneswar ...
Bhubaneswar🌹Ravi Tailkes  ❤CALL GIRLS 9777949614 💟 CALL GIRLS IN bhubaneswar ...Bhubaneswar🌹Ravi Tailkes  ❤CALL GIRLS 9777949614 💟 CALL GIRLS IN bhubaneswar ...
Bhubaneswar🌹Ravi Tailkes ❤CALL GIRLS 9777949614 💟 CALL GIRLS IN bhubaneswar ...
 
Strategic Resources May 2024 Corporate Presentation
Strategic Resources May 2024 Corporate PresentationStrategic Resources May 2024 Corporate Presentation
Strategic Resources May 2024 Corporate Presentation
 
2999,Vashi Fantastic Ellete Call Girls📞📞9833754194 CBD Belapur Genuine Call G...
2999,Vashi Fantastic Ellete Call Girls📞📞9833754194 CBD Belapur Genuine Call G...2999,Vashi Fantastic Ellete Call Girls📞📞9833754194 CBD Belapur Genuine Call G...
2999,Vashi Fantastic Ellete Call Girls📞📞9833754194 CBD Belapur Genuine Call G...
 
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdfMASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
 
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
 
Thane Call Girls , 07506202331 Kalyan Call Girls
Thane Call Girls , 07506202331 Kalyan Call GirlsThane Call Girls , 07506202331 Kalyan Call Girls
Thane Call Girls , 07506202331 Kalyan Call Girls
 
Call Girls in Benson Town / 8250092165 Genuine Call girls with real Photos an...
Call Girls in Benson Town / 8250092165 Genuine Call girls with real Photos an...Call Girls in Benson Town / 8250092165 Genuine Call girls with real Photos an...
Call Girls in Benson Town / 8250092165 Genuine Call girls with real Photos an...
 
Significant AI Trends for the Financial Industry in 2024 and How to Utilize Them
Significant AI Trends for the Financial Industry in 2024 and How to Utilize ThemSignificant AI Trends for the Financial Industry in 2024 and How to Utilize Them
Significant AI Trends for the Financial Industry in 2024 and How to Utilize Them
 
Turbhe Fantastic Escorts📞📞9833754194 Kopar Khairane Marathi Call Girls-Kopar ...
Turbhe Fantastic Escorts📞📞9833754194 Kopar Khairane Marathi Call Girls-Kopar ...Turbhe Fantastic Escorts📞📞9833754194 Kopar Khairane Marathi Call Girls-Kopar ...
Turbhe Fantastic Escorts📞📞9833754194 Kopar Khairane Marathi Call Girls-Kopar ...
 

Side 2019 #7

  • 1. Arthur Charpentier, SIDE Summer School, July 2019 # 7 Classification & Goodness of Fit (Theoretical) Arthur Charpentier (Universit´e du Qu´ebec `a Montr´eal) Machine Learning & Econometrics SIDE Summer School - July 2019 @freakonometrics freakonometrics freakonometrics.hypotheses.org 1
  • 2. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, in a Probabilistic Framework Assume that training and validation data are drawn i.i.d. from P, or (Y, X) ∼ F Consider y ∈ {−1, +1}. The true risk of a classifier is R(m) = P(Y,X)∼F m(X) = Y ) = E(Y,X)∼F (m(X), Y ) Bayes classifier is b(x) = sign E(Y,X)∼F Y |X = x which satisfies R(b) = inf m∈H {R(m)} (in the class H of all measurable functions), called Bayes risk. The empirical risk is Rn(m) = 1 n n i=1 (yi, m(xi) @freakonometrics freakonometrics freakonometrics.hypotheses.org 2
  • 3. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, in a Probabilistic Framework One might think of using regularized empirical risk minimization, mn ∈ argmin m∈M Rn(m) + λ m in a class of models M, where regularization term will control the complexity of the model to prevent overfitting. Let m denote the best model in M, m = argmin m∈M {R(m)} R(mn) − R(b) = R(mn) − R(m ) estimation error + (R(m ) − R(b) approximation error Since R(mn) = Rn(mn)) + [R(mn) − Rn(mn))], we can write R(mn) ≤ Rn(mn)) + something(m, F) @freakonometrics freakonometrics freakonometrics.hypotheses.org 3
  • 4. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, in a Probabilistic Framework To quantify this something(m, F), we need Hoeffding inequality, see Hoeffding (1963, Probability inequalities for sums of bounded random variables) Let g(x, y) = (m(x), y), for some model m. Let G = {g : (x, y) → (m(x), y), m ∈ M} If Z = (Y, X), set R(g) = EZ∼F g(Z) and Rn(g) = 1 n n i=1 g(zi). Hoeffding inequality If Z1, · · · , Zn are i.i.d. and if h is a bounded function (in [a, b]), then, ∀ > 0 Pn 1 n n i=1 h(Zi) − EF [h(Z)] ≥ ≤ 2 exp −2n 2 (b − a)2 @freakonometrics freakonometrics freakonometrics.hypotheses.org 4
  • 5. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, in a Probabilistic Framework or equivalently (let δ denote the upper bound) Pn 1 n n i=1 h(Zi) − EF [h(Z)] ≥ (b − a) −1 2n log(2δ) ≤ δ We can actually derive a one side majoration, and with probability (at least) 1 − δ R(g) ≤ Rn(g) + −1 2n log δ R(m) − Rn(m) For a fixed m ∈ M, R(m) − Rn(m) ∼ 1 √ n But it doesn’t help much, we need uniform deviations (or worst deviation). @freakonometrics freakonometrics freakonometrics.hypotheses.org 5
  • 6. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, in a Probabilistic Framework Consider a finite set of models. Define the set of bad samples Zj = (z1, · · · , zn) : R(gj) ≤ Rn(gj) ≥ 0 so that P[(Z1, · · · , Zn) ∈ Zj] ≤ δ, and then P[(Z1, · · · , Zn) ∈ Z1 ∩ Z1] ≤ P[(Z1, · · · , Zn) ∈ Z1] + P[(Z1, · · · , Zn) ∈ Z2] ≤ 2δ so that P  (Z1, · · · , Zn) ∈ ν j=1 Zj   ≤ ν j=1 P[(Z1, · · · , Zn) ∈ Zj] ≤ νδ Hence, P[∃g ∈ {g1, · · · , gν} : R(g) − Rn(g) ≥ ] ≤ ν · P[R(g) − Rn(g) ≥ ] ≤ ν · exp[−2n 2 ] @freakonometrics freakonometrics freakonometrics.hypotheses.org 6
  • 7. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, in a Probabilistic Framework If δ = ν exp[−2n 2 ], we can write and with probability (at least) 1 − δ ∀g ∈ {g1, · · · , gν}, R(g) − Rn(g) ≤ 1 n (log ν − log δ) Thus, we can write, for a finite set of models M = {m1, · · · , mν}, ∀m ∈ {m1, · · · , mν}, R(m) ≤ Rn(m) + 1 n (log ν − log δ) R(m) − Rn(m) - M finite, ν = |M| For the worst case scenario sup m∈Mν R(m) − Rn(m) ∼ log ν √ n Now, what if M is infinite ? @freakonometrics freakonometrics freakonometrics.hypotheses.org 7
  • 8. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, in a Probabilistic Framework Write Hoeffding’s inequality as P R(g) − Rn(g) ≥ −1 2n log δg ≤ δg so that, we a countable set G P ∃g ∈ G : R(g) − Rn(g) ≥ −1 2n log δg ≤ g∈G δg If δg = δ · µ(g) where µ is some measure on G, with probability (at least) 1 − δ, ∀g ∈ G, R(g) ≤ Rn(g) + −1 2n [log δ + log µ(g)] (see previous computations with µ(g) = ν−1 ) @freakonometrics freakonometrics freakonometrics.hypotheses.org 8
  • 9. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, in a Probabilistic Framework More generally, given a sample z = {z1, · · · , zn}, let Mz denote the set of classification that can be obtained, Mz = (m(z1), · · · , m(zn)) The growth function is the maximum number of ways into which n points can be classified by the function class M GM (n) = sup z Mz Vapnik-Chervonenkis : with (at least) probability 1 − δ, ∀m ∈ M, R(m) ≤ Rn(m) + 2 2 n [log GM (2n) − log(4δ)] The VC (Vapnik-Chervonenkis) dimension is the largest n such that GM (n) = 2n . It will be denoted VC(M). Observe that GM (n) ≤ 2n @freakonometrics freakonometrics freakonometrics.hypotheses.org 9
  • 10. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, in a Probabilistic Framework n ≤ VC(M) : n → GM (n) increases exponentially GM (n) = 2n n ≥ VC(M) : n → GM (n) increases at power speed GM (n) ≤ en VC(M) VC(M) Vapnik-Chervonenkis : with (at least) probability 1 − δ, ∀m ∈ M, R(m) ≤ Rn(m) + 2 2 n [VC(M) log en VC(M) − log(4δ)] R(m) − Rn(m) - M infinite For the worst case scenario sup m∈M R(m) − Rn(m) ∼ VC(M) · log n n To go further, see Bousquet, Boucheron & Lugosi (2005, Introduction to Learning Theory) @freakonometrics freakonometrics freakonometrics.hypotheses.org 10