SlideShare une entreprise Scribd logo
1  sur  92
Télécharger pour lire hors ligne
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Big Data for Economics # 4*
A. Charpentier (Universit´e de Rennes 1)
https://github.com/freakonometrics/ub
UB School of Economics Summer School, 2018.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 1
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
#4 Nonlinearities and Discontinuities*
@freakonometrics freakonometrics freakonometrics.hypotheses.org 2
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
References
Motivation
Kopczuk, W. Tax bases, tax rates and the elasticity of reported income. JPE.
References
Eubank, R.L. (1999) Nonparametric Regression and Spline Smoothing, CRC Press.
Fan, J. & Gijbels, I. (1996) Local Polynomial Modelling and Its Applications CRC
Press.
Hastie, T.J. & Tibshirani, R.J. (1990) Generalized Additive Models. CRC Press
Wand, M.P & Jones, M.C. (1994) Kernel Smoothing. CRC Press
@freakonometrics freakonometrics freakonometrics.hypotheses.org 3
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Deterministic or Parametric Transformations
Consider child mortality rate (y) as a function of GDP per capita (x).
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
qq
q
qq qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
qq
q q q q
q
q
q qq
q
q
qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
050100150
PIB par tête
Tauxdemortalitéinfantile
Afghanistan
Albania
Algeria
American.Samoa
Angola
Argentina
Armenia
Austria
Azerbaijan
Bahamas
Bangladesh
Belarus
Belgium
Belize
Benin
BhutanBolivia
Bosnia.and.Herzegovina
Brunei.Darussalam
Bulgaria
Burkina.Faso
Cambodia
Canada
Cape.Verde
Central.African.Republic
Chad
Channel.Islands Chile
China
Comoros
Congo
Cook.Islands
Côte.dIvoire
Cuba CyprusCzech.Republic
Korea
Democratic.Republic.of.the.Congo
Denmark
Djibouti
Egypt
El.Salvador
Equatorial.Guinea
Estonia
Fiji
FinlandFrance
French.Guiana
French.Polynesia
Gabon
Gambia
Ghana
Gibraltar
Greece
Grenada
Guam
Guatemala
Guinea
Guinea−Bissau
Guyana
Haiti
Honduras
India
Indonesia
Iran
IrelandIsrael Italy
Jamaica
Japan
Jordan
Kazakhstan
Kenya
Kyrgyzstan
Laos
Latvia
Lesotho
Libyan.Arab.Jamahiriya
Liechtenstein
Lithuania
Luxembourg
Madagascar
Malawi
Malaysia
Malta
Marshall.Islands
Martinique
Mauritius
Micronesia.(Federated.States.of)
Mongolia
Montenegro
Morocco
Mozambique
Myanmar
Namibia
Netherlands
Netherlands.Antilles
New.Caledonia
Nicaragua
Nigeria
Niue
Norway
Occupied.Palestinian.Territory
Oman
Pakistan
Papua.New.Guinea
Paraguay
Peru
Poland Puerto.Rico Qatar
Republic.of.Korea
Republic.of.Moldova
RéunionRomaniaRussian.Federation
Saint.Vincent.and.the.GrenadinesSamoa
San.Marino
Saudi.Arabia
Serbia
Sierra.Leone
Singapore
Slovakia Slovenia
Solomon.Islands
Somalia
Sri.Lanka
Sudan
Suriname
Sweden
Syrian.Arab.Republic
Tajikistan
Thailand
Macedonia
Timor−Leste
Togo
Tunisia
Turkey
Turkmenistan
Tuvalu
Ukraine
United.Arab.Emirates
United.Kingdom
United.Republic.of.Tanzania
United.States.of.America
United.States.Virgin.Islands
Uruguay
Venezuela
Viet.Nam
YemenZimbabwe
@freakonometrics freakonometrics freakonometrics.hypotheses.org 4
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Deterministic or Parametric Transformations
Logarithmic transformation, log(y) as a function of log(x)
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
1e+02 5e+02 1e+03 5e+03 1e+04 5e+04 1e+05
25102050100
PIB par tête (log)
Tauxdemortalité(log)
Afghanistan
Albania
Algeria
American.Samoa
Angola
Argentina
Armenia
Aruba
AustraliaAustria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belgium
Belize
Benin
BhutanBolivia
Bosnia.and.Herzegovina
Botswana
Brazil
Brunei.Darussalam
Bulgaria
Burkina.FasoBurundi
Cambodia
Cameroon
Canada
Cape.Verde
Central.African.Republic
Chad
Channel.Islands
Chile
China
Hong.Kong
Colombia
Comoros
Congo
Cook.Islands
Costa.Rica
Côte.dIvoire
Croatia
Cuba
Cyprus
Czech.Republic
Korea
Democratic.Republic.of.the.Congo
Denmark
Djibouti
Dominican.Republic
Ecuador
Egypt
El.Salvador
Equatorial.Guinea
Eritrea
Estonia
Ethiopia
Fiji
FinlandFrance
French.Guiana
French.Polynesia
Gabon
Gambia
Georgia
Germany
Ghana
Gibraltar
Greece
Greenland
Grenada
Guadeloupe
Guam
Guatemala
Guinea
Guinea−Bissau
Guyana
Haiti
Honduras
Hungary
Iceland
India
Indonesia
Iran
Iraq
Ireland
Isle.of.Man
Israel Italy
Jamaica
Japan
Jordan
Kazakhstan
Kenya
Kiribati
Kuwait
KyrgyzstanLaos
Latvia
Lebanon
Lesotho
Liberia
Libyan.Arab.Jamahiriya
Liechtenstein
Lithuania
Luxembourg
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall.Islands
Martinique
Mauritania
Mauritius
Mexico
Micronesia.(Federated.States.of)
Mongolia
Montenegro
Morocco
Mozambique
Myanmar
Namibia
Nepal
Netherlands
Netherlands.Antilles
New.Caledonia
New.Zealand
Nicaragua
Niger Nigeria
Niue
Norway
Occupied.Palestinian.Territory
Oman
Pakistan
Palau
Panama
Papua.New.Guinea
Paraguay
Peru
Philippines
Poland
Portugal
Puerto.Rico
Qatar
Republic.of.Korea
Republic.of.Moldova
Réunion
Romania
Russian.Federation
Rwanda
Saint.Lucia
Saint.Vincent.and.the.GrenadinesSamoa
San.Marino
Sao.Tome.and.Principe
Saudi.Arabia
Senegal
Serbia
Sierra.Leone
Singapore
Slovakia
Slovenia
Solomon.Islands
Somalia
South.Africa
Spain
Sri.Lanka
Sudan
Suriname
Swaziland
Sweden
Switzerland
Syrian.Arab.Republic
Tajikistan
Thailand
Macedonia
Timor−Leste
Togo
Tonga
Trinidad.and.Tobago
Tunisia
Turkey
Turkmenistan
Turks.and.Caicos.Islands
Tuvalu
Uganda
Ukraine
United.Arab.Emirates
United.Kingdom
United.Republic.of.Tanzania
United.States.of.America
United.States.Virgin.Islands
Uruguay
Uzbekistan
Vanuatu
Venezuela
Viet.Nam
Western.Sahara
Yemen
Zambia
Zimbabwe
@freakonometrics freakonometrics freakonometrics.hypotheses.org 5
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Deterministic or Parametric Transformations
Reverse transformation
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
qq
q
qq qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
qq
q q q q
q
q
q qq
q
q
qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
050100150
PIB par tête
Tauxdemortalité
Afghanistan
Albania
Algeria
American.Samoa
Angola
Argentina
Armenia
Aruba
AustraliaAustria
Azerbaijan
Bahamas
Bahrain
Bangladesh
BarbadosBelarus
Belgium
Belize
Benin
BhutanBolivia
Bosnia.and.Herzegovina
Botswana
Brazil
Brunei.Darussalam
Bulgaria
Burkina.Faso
Burundi
Cambodia
Cameroon
Canada
Cape.Verde
Central.African.Republic
Chad
Channel.Islands Chile
China
Hong.Kong
Colombia
Comoros
Congo
Cook.IslandsCosta.Rica
Côte.dIvoire
CroatiaCuba CyprusCzech.Republic
Korea
Democratic.Republic.of.the.Congo
Denmark
Djibouti
Dominican.Republic
Ecuador
Egypt
El.Salvador
Equatorial.Guinea
Eritrea
Estonia
Ethiopia
Fiji
FinlandFrance
French.Guiana
French.Polynesia
Gabon
Gambia
Georgia
Germany
Ghana
Gibraltar
GreeceGreenland
Grenada
GuadeloupeGuam
Guatemala
Guinea
Guinea−Bissau
Guyana
Haiti
Honduras
Hungary
Iceland
India
Indonesia
Iran
Iraq
Ireland
Isle.of.Man
Israel Italy
Jamaica
Japan
Jordan
Kazakhstan
Kenya
Kiribati
Kuwait
Kyrgyzstan
Laos
Latvia
Lebanon
Lesotho
Liberia
Libyan.Arab.Jamahiriya
Liechtenstein
Lithuania
Luxembourg
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall.Islands
Martinique
Mauritania
Mauritius
Mexico
Micronesia.(Federated.States.of)
Mongolia
Montenegro
Morocco
Mozambique
Myanmar
Namibia
Nepal
Netherlands
Netherlands.Antilles
New.CaledoniaNew.Zealand
Nicaragua
NigerNigeria
Niue
Norway
Occupied.Palestinian.Territory
Oman
Pakistan
Palau
Panama
Papua.New.Guinea
Paraguay
Peru
Philippines
Poland PortugalPuerto.Rico Qatar
Republic.of.Korea
Republic.of.Moldova
RéunionRomaniaRussian.Federation
Rwanda
Saint.Lucia
Saint.Vincent.and.the.GrenadinesSamoa
San.Marino
Sao.Tome.and.Principe
Saudi.Arabia
Senegal
Serbia
Sierra.Leone
Singapore
Slovakia Slovenia
Solomon.Islands
Somalia
South.Africa
Spain
Sri.Lanka
Sudan
Suriname
Swaziland
Sweden Switzerland
Syrian.Arab.Republic
Tajikistan
Thailand
Macedonia
Timor−Leste
Togo
Tonga
Trinidad.and.Tobago
Tunisia
Turkey
Turkmenistan
Turks.and.Caicos.Islands
Tuvalu
Uganda
Ukraine
United.Arab.Emirates
United.Kingdom
United.Republic.of.Tanzania
United.States.of.America
United.States.Virgin.Islands
Uruguay
Uzbekistan
Vanuatu
Venezuela
Viet.Nam
Western.Sahara
Yemen
Zambia
Zimbabwe
@freakonometrics freakonometrics freakonometrics.hypotheses.org 6
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Box-Cox transformation
See Box & Cox (1964) An Analysis of Transformations ,
h(y, λ) =



yλ
− 1
λ
if λ = 0
log(y) if λ = 0
or
h(y, λ, µ) =



[y + µ]λ
− 1
λ
if λ = 0
log([y + µ]) if λ = 0
@freakonometrics freakonometrics freakonometrics.hypotheses.org 7
0 1 2 3 4
−4−3−2−1012
−1 −0.5 0 0.5 1 1.5 2
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Profile Likelihood
In a statistical context, suppose that unknown parameter can be partitioned
θ = (λ, β) where λ is the parameter of interest, and β is a nuisance parameter.
Consider {y1, · · · , yn}, a sample from distribution Fθ, so that the log-likelihood is
log L(θ) =
n
i=1
log fθ(yi)
θ
MLE
is defined as θ
MLE
= argmax {log L(θ)}
Rewrite the log-likelihood as log L(θ) = log Lλ(β). Define
β
pMLE
λ = argmax
β
{log Lλ(β)}
and then λpMLE
= argmax
λ
log Lλ(β
pMLE
λ ) . Observe that
√
n(λpMLE
− λ)
L
−→ N(0, [Iλ,λ − Iλ,βI−1
β,βIβ,λ]−1
)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 8
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Profile Likelihood and Likelihood Ratio Test
The (profile) likelihood ratio test is based on
2 max L(λ, β) − max L(λ0, β)
If (λ0, β0) are the true value, this difference can be written
2 max L(λ, β) − max L(λ0, β0) − 2 max L(λ0, β) − max L(λ0, β0)
Using Taylor’s expension
∂L(λ, β)
∂λ (λ0,βλ0
)
∼
∂L(λ, β)
∂λ (λ0,β0
)
− Iβ0λ0
I−1
β0β0
∂L(λ0, β)
∂β (λ0,β0
)
Thus,
1
√
n
∂L(λ, β)
∂λ (λ0,βλ0
)
L
→ N(0, Iλ0λ0 ) − Iλ0β0
I−1
β0β0
Iβ0λ0
and 2 L(λ, β) − L(λ0, βλ0
)
L
→ χ2
(dim(λ)).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 9
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Profile Likelihood and Likelihood Ratio Test
Consider some lognormal sample, and fit a Gamma distribution,
f(x; α, β) =
xα−1
βα
e−βx
Γ(α)
with x > 0 and θ = (α, β).
1 > x=exp(rnorm (100))
Maximum-likelihood, θ = argmax{log L(θ)}.
1 > library(MASS)
2 > (F=fitdistr(x,"gamma"))
3 shape rate
4 1.4214497 0.8619969
5 (0.1822570) (0.1320717)
6 > F$estimate [1]+c(-1,1)*1.96*F$sd [1]
7 [1] 1.064226 1.778673
@freakonometrics freakonometrics freakonometrics.hypotheses.org 10
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Profile Likelihood and Likelihood Ratio Test
See also
1 > log_lik=function(theta){
2 + a=theta [1]
3 + b=theta [2]
4 + logL=sum(log(dgamma(x,a,b)))
5 + return(-logL)
6 + }
7 > optim(c(1 ,1),log_lik)
8 $par
9 [1] 1.4214116 0.8620311
We can also use profile likelihood,
α = argmax max
β
log L(α, β) = argmax log L(α, βα)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 11
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Profile Likelihood and Likelihood Ratio Test
1 > prof_log_lik=function(a){
2 + b=( optim (1, function(z) -sum(log(dgamma(x,a,z)))))$par
3 + return(-sum(log(dgamma(x,a,b))))
4 + }
5
6 > vx=seq (.5,3, length =101)
7 > vl=-Vectorize(prof_log_lik)(vx)
8 > plot(vx ,vl ,type="l")
9 > optim (1,prof_log_lik)
10 $par
11 [1] 1.421094
We can use the likelihood ratio test
2 log Lp(α) − log Lp(α) ∼ χ2
(1)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 12
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Profile Likelihood and Likelihood Ratio Test
The implied 95% confidence interval is
1 > (b1=uniroot(function(z) Vectorize(prof_log_lik)(z)+borne ,c(.5 ,1.5))
$root)
2 [1] 1.095726
3 > (b2=uniroot(function(z) Vectorize(prof_log_lik)(z)+borne ,c
(1.25 ,2.5))$root)
4 [1] 1.811809
@freakonometrics freakonometrics freakonometrics.hypotheses.org 13
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Box-Cox
1 > boxcox(lm(dist˜speed ,data=cars))
Here h∗
∼ 0.5
Heuristally, y
1/2
i ∼ β0 + β1xi + εi
why not consider a quadratic regression...?
−0.5 0.0 0.5 1.0 1.5 2.0
−90−80−70−60−50
λ
log−Likelihood
95%
@freakonometrics freakonometrics freakonometrics.hypotheses.org 14
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Uncertainty: Parameters vs. Prediction
Uncertainty on regression parameters (β0, β1)
From the output of the regression we can derive
confidence intervals for β0 and β1, usually
βk ∈ βk ± u1−α/2se[βk]
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
Vitesse du véhicule
Distancedefreinage
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
Vitesse du véhicule
Distancedefreinage
@freakonometrics freakonometrics freakonometrics.hypotheses.org 15
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Uncertainty: Parameters vs. Prediction
Uncertainty on a prediction, y = m(x). Usually
m(x) ∈ m(x) ± u1−α/2se[m(x)]
hence, for a linear model
xT
β ± u1−α/2σ xT[XT
X]−1x
i.e. (with one covariate)
se2
[m(x)]2
= Var[β0 + β1x]
se2
[β0] + cov[β0, β1]x + se2
[β1]x2
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
Vitesse du véhicule
Distancedefreinage
1 > predict(lm(dist˜speed ,data=cars),newdata=data.frame(speed=x),
interval="confidence")
@freakonometrics freakonometrics freakonometrics.hypotheses.org 16
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Least Squares and Expected Value (Orthogonal Projection Theorem)
Let y ∈ Rd
, y = argmin
m∈R



n
i=1
1
n
yi − m
εi
2



. It is the empirical version of
E[Y ] = argmin
m∈R



y − m
ε
2
dF(y)



= argmin
m∈R



E (Y − m
ε
)2



where Y is a 1 random variable.
Thus, argmin
m(·):Rk→R



n
i=1
1
n
yi − m(xi)
εi
2



is the empirical version of E[Y |X = x].
@freakonometrics freakonometrics freakonometrics.hypotheses.org 17
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
The Histogram and the Regressogram
Connections between the estimation of f(y) and E[Y |X = x]. Assume that
yi ∈ [a1, ak+1), divided in k classes [aj, aj+1). The histogram is
ˆfa(y) =
k
j=1
1(t ∈ [aj, aj+1))
aj+1 − aj
1
n
n
i=1
1(yi ∈ [aj, aj+1))
Assume that aj+1 −aj = hn and hn → 0 as n → ∞
with nhn → ∞ then
E ( ˆfa(y) − f(y))2
∼ O(n−2/3
)
(for an optimal choice of hn).
1 > hist(height)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 18
150 160 170 180 190
0.000.010.020.030.040.050.06
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
The Histogram and the Regressogram
Then a moving histogram was considered,
ˆf(y) =
1
2nhn
n
i=1
1(yi ∈ [y ± hn)) =
1
nhn
n
i=1
k
yi − y
hn
with k(x) =
1
2
1(x ∈ [−1, 1)), which a (flat) kernel
estimator.
1 > density(height , kernel = " rectangular ")
150 160 170 180 190 200
0.000.010.020.030.04
150 160 170 180 190 200
0.000.010.020.030.04
@freakonometrics freakonometrics freakonometrics.hypotheses.org 19
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
The Histogram and the Regressogram
From Tukey (1961) Curves as parameters, and touch
estimation, the regressogram is defined as
ˆma(x) =
n
i=1 1(xi ∈ [aj, aj+1))yi
n
i=1 1(xi ∈ [aj, aj+1))
and the moving regressogram is
ˆm(x) =
n
i=1 1(xi ∈ [x ± hn])yi
n
i=1 1(xi ∈ [x ± hn])
@freakonometrics freakonometrics freakonometrics.hypotheses.org 20
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Nadaraya-Watson and Kernels
Background: Kernel Density Estimator
Consider sample {y1, · · · , yn}, Fn empirical cumulative distribution function
Fn(y) =
1
n
n
i=1
1(yi ≤ y)
The empirical measure Pn consists in weights 1/n on each observation.
Idea: add (little) continuous noise to smooth Fn.
Let Yn denote a random variable with distribution Fn and define
˜Y = Yn + hU where U ⊥⊥ Yn, with cdf K
The cumulative distribution function of ˜Y is ˜F
˜F(y) = P[ ˜Y ≤ y] = E 1( ˜Y ≤ y) = E E 1( ˜Y ≤ y) Yn
˜F(y) = E 1 U ≤
y − Yn
h
Yn =
n
i=1
1
n
K
y − yi
h
@freakonometrics freakonometrics freakonometrics.hypotheses.org 21
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Nadaraya-Watson and Kernels
If we differentiate
˜f(y)=
1
nh
n
i=1
k
y − yi
h
=
1
n
n
i=1
kh (y − yi) with kh(u) =
1
h
k
u
h
˜f is the kernel density estimator of f, with kernel
k and bandwidth h.
Rectangular, k(u) =
1
2
1(|u| ≤ 1)
Epanechnikov, k(u) =
3
4
1(|u| ≤ 1)(1 − u2
)
Gaussian, k(u) =
1
√
2π
e− u2
2
1 > density(height , kernel = " epanechnikov ")
−2 −1 0 1 2
@freakonometrics freakonometrics freakonometrics.hypotheses.org 22
150 160 170 180 190 200
0.000.010.020.030.04
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Kernels and Statistical Properties
Consider here an i.id. sample {Y1, · · · , Yn} with density f
Given y, observe that E[ ˜f(y)] =
1
h
k
y − t
h
f(t)dt = k(u)f(y − hu)du. Use
Taylor expansion around h = 0,f(y − hu) ∼ f(y) − f (y)hu +
1
2
f (y)h2
u2
E[ ˜f(y)] = f(y)k(u)du − f (y)huk(u)du +
1
2
f (y + hu)h2
u2
k(u)du
= f(y) + 0 + h2 f (y)
2
k(u)u2
du + o(h2
)
Thus, if f is twice continuously differentiable with bounded second derivative,
k(u)du = 1, uk(u)du = 0 and u2
k(u)du < ∞,
then E[ ˜f(y)] = f(y) + h2 f (y)
2
k(u)u2
du + o(h2
)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 23
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Kernels and Statistical Properties
For the heuristics on that bias, consider a flat kernel,
and set
fh(y) =
F(y + h) − F(y − h)
2h
then the natural estimate is
fh(y) =
F(y + h) − F(y − h)
2h
=
1
2nh
n
i=1
1(yi ∈ [y ± h])
Zi
where Zi’s are Bernoulli B(px) i.id. variables with
px = P[Yi ∈ [x ± h]] = 2h · fh(x). Thus, E(fh(y)) = fh(y), while
fh(y) ∼ f(y) +
h2
6
f (y) as h ∼ 0.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 24
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Kernels and Statistical Properties
Similarly, as h → 0 and nh → ∞
Var[ ˜f(y)] =
1
n
E[kh(z − Z)2
] − (E[kh(z − Z)])
2
Var[ ˜f(y)] =
f(y)
nh
k(u)2
du + o
1
nh
Hence
• if h → 0 the bias goes to 0
• if nh → ∞ the variance goes to 0
@freakonometrics freakonometrics freakonometrics.hypotheses.org 25
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Kernels and Statistical Properties
Extension in Higher Dimension:
˜f(y) =
1
n|H|1/2
n
i=1
k H−1/2
(y − yi)
˜f(y) =
1
nhd|Σ|1/2
n
i=1
k Σ−1/2 (y − yi)
h
@freakonometrics freakonometrics freakonometrics.hypotheses.org 26
150 160 170 180 190 200
406080100120
height
weight
2e−04
4e−04
6e−04
8e−04
0.001
0.0012
0.0014
0.0016
0.0018
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Kernels and Convolution
Given f and g, set
(f g)(x) =
R
f(x − y)g(y)dy
Then ˜fh = ( ˆf kh), where
ˆf(y) =
ˆF(y)
dy
=
n
i=1
δyi
(y)
Hence, ˜f is the distribution of Y + ε where
Y is uniform over {y1, · · · , yn} and ε ∼ kh are independent
@freakonometrics freakonometrics freakonometrics.hypotheses.org 27
q q q q q
−0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
0.00.20.40.60.81.0
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Nadaraya-Watson and Kernels
Here E[Y |X = x] = m(x). Write m as a function of densities
m(x) = yf(y|x)dy =
yf(y, x)dy
f(y, x)dy
Consider some bivariate kernel k, such that
tk(t, u)dt = 0 and κ(u) = k(t, u)dt
For the numerator, it can be estimated using
y ˜f(y, x)dy =
1
nh2
n
i=1
yk
y − yi
h
,
x − xi
h
=
1
nh
n
i=1
yik t,
x − xi
h
dt =
1
nh
n
i=1
yiκ
x − xi
h
@freakonometrics freakonometrics freakonometrics.hypotheses.org 28
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Nadaraya-Watson and Kernels
and for the denominator
f(y, x)dy =
1
nh2
n
i=1
k
y − yi
h
,
x − xi
h
=
1
nh
n
i=1
κ
x − xi
h
Therefore, plugging in the expression for g(x) yields
˜m(x) =
n
i=1 yiκh (x − xi)
n
i=1 κh (x − xi)
Observe that this regression estimator is a weighted
average (see linear predictor section)
˜m(x) =
n
i=1
ωi(x)yi with ωi(x) =
κh (x − xi)
n
i=1 κh (x − xi)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 29
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Nadaraya-Watson and Kernels
One can prove that kernel regression bias is given by
E[ ˜m(x)] ∼ m(x) + h2 C1
2
m (x) + C2m (x)
f (x)
f(x)
while Var[ ˜m(x)] ∼
C3
nh
σ(x)
f(x)
. In this univariate case, one can easily get the kernel
estimator of derivatives.
Actually, ˜m is a function of bandwidth h.
Note: this can be extended to multivariate x.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 30
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
q
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Nadaraya-Watson and Kernels in Higher Dimension
Here mH(x) =
n
i=1 yikH(xi − x)
n
i=1 kH(xi − x)
for some symmetric positive definite
bandwidth matrix H, and kH(x) = det[H]−1
k(H−1
x). Then
E[mH(x)] ∼ m(x) +
C1
2
trace HT
m (x)H + C2
m (x)T
HHT
f(x)
f(x)
while
Var[mH(x)] ∼
C3
ndet(H)
σ(x)
f(x)
Hence, if H = hI, h ∼ Cn− 1
4+dim(x) .
@freakonometrics freakonometrics freakonometrics.hypotheses.org 31
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
From kernels to k-nearest neighbours
An alternative is to consider
˜mk(x) =
1
n
n
i=1
ωi,k(x)yi
where ωi,k(x) =
n
k
if i ∈ Ik
x with
Ik
x = {i : xi one of the k nearest observations to x}
Lai (1977) Large sample properties of K-nearest neighbor procedures if k → ∞ and
k/n → 0 as n → ∞, then
E[ ˜mk(x)] ∼ m(x) +
1
24f(x)3
(m f + 2m f )(x)
k
n
2
while Var[ ˜mk(x)] ∼
σ2
(x)
k
@freakonometrics freakonometrics freakonometrics.hypotheses.org 32
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
From kernels to k-nearest neighbours
Remark: Brent & John (1985) Finding the median requires 2n comparisons
considered some median smoothing algorithm, where we consider the median
over the k nearest neighbours (see section #4).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 33
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
k-Nearest Neighbors and Curse of Dimensionality
The higher the dimension, the larger the distance to the closest neigbbor
min
i∈{1,··· ,n}
{d(a, xi)}, xi ∈ Rd
.
q
dim1 dim2 dim3 dim4 dim5
0.00.20.40.60.81.0
dim1 dim2 dim3 dim4 dim5
0.00.20.40.60.81.0
n = 10 n = 100
@freakonometrics freakonometrics freakonometrics.hypotheses.org 34
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
k-Nearest Neighbors and Imputation
“The best thing to do with missing values is not to have any” (Gertrude Mary
Cox)
Consider some bivariate observations xi, where some
x2,i’s can be (randomly) missing.
One can consider simple imputation x2,i = x2
(computed on non-missing values).
One can consider a regression of x2 on x1
(but that’s aribrary)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 35
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
One can also draw values randomly
x2,i = E[X2|X1 = x1,i] + εi
where ε has a Gaussian distribution.
One can consider a PCA (on non-missing values)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 36
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
or one can consider k nearest-neighbors.
There are packages dedicated to imputation (see VIM
package)
1 > library(VIM)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 37
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Missing humidity giving the temperature
1 > y= tao[,c("Air.Temp","Humidity")]
2 > histMiss(y)
22 24 26 28
020406080
Air.Temp
missing/observedinHumidity
missing
C
1 > y= tao[,c("Humidity","Air.Temp")]
2 > histMiss(y)
70 75 80 85 90 95
020406080100
Humidity
missing/observedinAir.Temp
missing
@freakonometrics freakonometrics freakonometrics.hypotheses.org 38
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
k-Nearest Neighbors and Imputation
This package countains a k-Neareast
Neighbors algorithm for imputation
1 > tao_kNN <- kNN(tao , k = 5)
Imputation can be visualized using
1 vars <- c("Air.Temp","Humidity","
Air.Temp_imp","Humidity_imp")
2 marginplot(tao_kNN[,vars],
delimiter="imp", alpha =0.6)
q
q
q
93
813
22 23 24 25 26 27 28
7580859095
Air.Temp
Humidity
@freakonometrics freakonometrics freakonometrics.hypotheses.org 39
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Bandwidth selection : MISE for Density
MSE[ ˜f(y)] = bias[ ˜f(y)]2
+ Var[ ˜f(y)]
MSE[ ˜f(y)] = f(y)
1
nh
k(u)2
du + h4 f (y)
2
k(u)u2
du
2
+ o h4
+
1
nh
Bandwidth choice is based on minimization of the asymptotic integrated MSE
(over y)
MISE( ˜f) = MSE[ ˜f(y)]dy ∼
1
nh
k(u)2
du + h4 f (y)
2
k(u)u2
du
2
@freakonometrics freakonometrics freakonometrics.hypotheses.org 40
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Bandwidth selection : MISE for Density
Thus, the first-order condition yields
−
C1
nh2
+ h3
f (y)2
dyC2 = 0
with C1 = k2
(u)du and C2 = k(u)u2
du
2
, and
h = n− 1
5
C1
C2 f (y)dy
1
5
h = 1.06n− 1
5 Var[Y ] from Silverman (1986) Density Estimation
1 > bw.nrd0(cars$speed)
2 [1] 2.150016
3 > bw.nrd(cars$speed)
4 [1] 2.532241
with Scott correction, see Scott (1992) Multivariate Density Estimation
@freakonometrics freakonometrics freakonometrics.hypotheses.org 41
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Bandwidth selection : MISE for Regression Model
One can prove that
MISE[mh] ∼
bias2
h4
4
x2
k(x)dx
2
m (x) + 2m (x)
f (x)
f(x)
2
dx
+
σ2
nh
k2
(x)dx ·
dx
f(x)
variance
as n → ∞ and nh → ∞.
The bias is sensitive to the position of the xi’s.
h = n− 1
5


C1
dx
f(x)
C2 m (x) + 2m (x)f (x)
f(x) dx


1
5
Problem: depends on unknown f(x) and m(x).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 42
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Bandwidth Selection : Cross Validation
Consider some risk, function of some parameter h. E.g. MISE[mh].
A first idea is to consider a validation set approach,
• Split the data in two parts
• Train the method in the first part
• Compute the error on the second part
Problem : every split yields a different estimate of the
error
@freakonometrics freakonometrics freakonometrics.hypotheses.org 43
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Bandwidth Selection : Cross Validation
Consider leave-one-out cross validation
For every i ∈ {1, 2, · · · , n}
• Train the model on every point, but i
• Compute the test error on the held out point
CVLOO =
1
n
n
i=1
yi − y
(−i)
i
2
where the prediction is obtained on the model based on
data where observation i was removed.
It can be computationally expensive
CVLOO =
1
n
n
i=1
yi − yi
1 − hi,i
2
where hi,i is the leverage statistic
@freakonometrics freakonometrics freakonometrics.hypotheses.org 44
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Bandwidth Selection : Cross Validation
Consider k-fold cross validation
For every j ∈ {1, 2, · · · , n}
• Train the model on every fold, but the ith
• Compute the test error on the ith fold
As we increase k in k-fold cross-validation, we decrease the bias, but increase the
variance. One can use bootstrap to estimate measures of uncertainty
@freakonometrics freakonometrics freakonometrics.hypotheses.org 45
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Bandwidth Selection : Cross Validation
Let R(h) = E (Y − mh(X))2
.
Natural idea R(h) =
1
n
n
i=1
(yi − mh(xi))
2
Instead use leave-one-out cross validation,
R(h) =
1
n
n
i=1
yi − m
(i)
h (xi)
2
where m
(i)
h is the estimator obtained by omitting the ith
pair (yi, xi) or k-fold cross validation,
R(h) =
1
n
k
j=1 i∈Ij
yi − m
(j)
h (xi)
2
where m
(j)
h is the estimator obtained by omitting pairs
(yi, xi) with i ∈ Ij.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 46
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Bandwidth Selection : Cross Validation
Then find (numerically)
h = argmin R(h)
In the context of density estimation, see Chiu
(1991) Bandwidth Selection for Kernel Density Es-
timation 2 4 6 8 10
1416182022
bandwidth
Usual bias-variance tradeoff, or Goldilock principle:
h should be neither too small, nor too large
• undersmoothed: bias too large, variance too small
• oversmoothed: variance too large, bias too small
@freakonometrics freakonometrics freakonometrics.hypotheses.org 47
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Local Linear Regression
Consider ˆm(x) defined as ˆm(x) = β0 where (β0, β) is the solution of
min
(β0,β)
n
i=1
ω
(x)
i yi − [β0 + (x − xi)T
β]
2
where ω
(x)
i = kh(x − xi), e.g.
i.e. we seek the constant term in a weighted least squares regression of yi’s on
x − xi’s. If Xx is the matrix [1 (x − X)T
], and if W x is a matrix
diag[kh(x − x1), · · · , kh(x − xn)]
then ˆm(x) = 1T
(XT
xW xXx)−1
XT
xW xy
This estimator is also a linear predictor :
ˆm(x) =
n
i=1
ai(x)
ai(x)
yi
@freakonometrics freakonometrics freakonometrics.hypotheses.org 48
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
where
ai(x) =
1
n
kh(x − xi) 1 − s1(x)T
s2(x)−1 x − xi
h
with
s1(x) =
1
n
n
i=1
kh(x−xi)
x − xi
h
and s2(x) =
1
n
n
i=1
kh(x−xi)
x − xi
h
x − xi
h
Note that Nadaraya-Watson estimator was simply the solution of
min
β0
n
i=1
ω
(x)
i (yi − β0)
2
where ω
(x)
i = kh(x − xi)
E[ ˆm(x)] ∼ m(x) +
h2
2
m (x)µ2 where µ2 = k(u)u2
du.
Var[ ˆm(x)] ∼
1
nh
νσ2
x
f(x)
where ν = k(u)2
du
@freakonometrics freakonometrics freakonometrics.hypotheses.org 49
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Thus, kernel regression MSE is
h2
4
g (x) + 2g (x)
f (x)
f(x)
2
µ2
2 +
1
nh
νσ2
x
f(x)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
Vitesse du véhciule
Distancedefreinage
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
Vitesse du véhciule
Distancedefreinage
1 > loess(dist ˜ speed , cars ,span =0.75 , degree =1)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 50
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
2 > predict(REG , data.frame(speed = seq(5, 25, 0.25)), se = TRUE)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
Vitesse du véhciule
Distancedefreinage
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25020406080100120
Vitesse du véhciule
Distancedefreinage
@freakonometrics freakonometrics freakonometrics.hypotheses.org 51
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Local polynomials
One might assume that, locally, m(x) ∼ µx(u) as u ∼ 0, with
µx(u) = β
(x)
0 + β
(x)
1 + [u − x] + β
(x)
2 +
[u − x]2
2
+ β
(x)
3 +
[u − x]3
2
+ · · ·
and we estimate β(x)
by minimizing
n
i=1
ω
(x)
i yi − µx(xi)
2
.
If Xx is the design matrix 1 xi − x
[xi − x]2
2
[xi − x]3
3
· · · , then
β
(x)
= XT
xW xXx
−1
XT
xW xy
(weighted least squares estimators).
1 > library(locfit)
2 > locfit(dist˜speed ,data=cars)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 52
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Series Regression
Recall that E[Y |X = x] = m(x).
Why not approximate m by a linear combination of approx-
imating functions h1(x), · · · , hk(x).
Set h(x) = (h1(x), · · · , hk(x)), and consider the regression
of yi’s on h(xi)’s,
yi = h(xi)T
β + εi
Then β = (HT
H)−1
HT
y
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
Vitesse du véhciule
Distancedefreinage
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
Vitesse du véhciule
Distancedefreinage
@freakonometrics freakonometrics freakonometrics.hypotheses.org 53
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Series Regression : polynomials
Even if m(x) = E(Y |X = x) is not a polynomial function,
a polynomial can still be a good approximation.
From Stone-Weierstrass theorem, if m(·) is continuous on
some interval, then there is a uniform approximation of
m(·) by polynomial functions.
1 > reg <- lm(y˜x,data=db)
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
0 2 4 6 8 10
−1.5−1.0−0.50.00.51.01.5
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
0 2 4 6 8 10
−1.5−1.0−0.50.00.51.01.5
@freakonometrics freakonometrics freakonometrics.hypotheses.org 54
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Series Regression : polynomials
Assume that m(x) = E(Y |X = x) =
k
i=0
αixi
, where pa-
rameters α0, · · · , αk will be estimated (but not k).
1 > reg <- lm(y˜poly(x ,5) ,data=db)
2 > reg <- lm(y˜poly(x ,25) ,data=db)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 55
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
0 2 4 6 8 10
−1.5−1.0−0.50.00.51.01.5
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
0 2 4 6 8 10
−1.5−1.0−0.50.00.51.01.5
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Series Regression : (Linear) Splines
Consider m + 1 knots on X, min{xi} ≤ t0 ≤ t1 ≤ · · · ≤ tm ≤ max{xn}, then
define linear (degree = 1) splines positive function,
bj,1(x) = (x − tj)+ =



x − tj if x > tj
0 otherwise
for linear splines, consider
Yi = β0 + β1Xi + β2(Xi − s)+ + εi
1 > positive_part <- function(x) ifelse(x>0,x ,0)
2 > reg <- lm(Y˜X+positive_part(X-s), data=db)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 56
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
0 2 4 6 8 10
−1.5−1.0−0.50.00.51.01.5
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Series Regression : (Linear) Splines
for linear splines, consider
Yi = β0 + β1Xi + β2(Xi − s1)+ + β3(Xi − s2)+ + εi
1 > reg <- lm(Y˜X+positive_part(X-s1)+
2 positive_part(X-s2), data=db)
3 > library(bsplines)
A spline is a function defined by piecewise polynomials.
b-splines are defined recursively
@freakonometrics freakonometrics freakonometrics.hypotheses.org 57
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
0 2 4 6 8 10
−1.5−1.0−0.50.00.51.01.5
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
b-Splines (in Practice)
1 > reg1 <- lm(dist˜speed+positive_part(speed -15) ,
data=cars)
2 > reg2 <- lm(dist˜bs(speed ,df=2, degree =1) , data=
cars)
Consider m+1 knots on [0, 1], 0 ≤ t0 ≤ t1 ≤ · · · ≤ tm ≤ 1,
then define recursively b-splines as
bj,0(t) =



1 if tj ≤ t < tj+1
0 otherwise, and
bj,n(t) =
t − tj
tj+n − tj
bj,n−1(t)
+
tj+n+1 − t
tj+n+1 − tj+1
bj+1,n−1(t)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
@freakonometrics freakonometrics freakonometrics.hypotheses.org 58
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
b-Splines (in Practice)
1 > summary(reg1)
2
3 Coefficients :
4 Estimate Std Error t value Pr(>|t|)
5 (Intercept) -7.6519 10.6254 -0.720 0.475
6 speed 3.0186 0.8627 3.499 0.001 **
7 (speed -15) 1.7562 1.4551 1.207 0.233
8
9 > summary(reg2)
10
11 Coefficients :
12 Estimate Std Error t value Pr(>|t|)
13 (Intercept) 4.423 7.343 0.602 0.5493
14 bs(speed)1 33.205 9.489 3.499 0.0012 **
15 bs(speed)2 80.954 8.788 9.211 4.2e -12 ***
@freakonometrics freakonometrics freakonometrics.hypotheses.org 59
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
0 0.25 0.5 0.75 1
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
0 0.25 0.5 0.75 1
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
b and p-Splines
Note that those spline function define an orthonormal ba-
sis.
O’Sullivan (1986) A statistical perspective on ill-posed in-
verse problems suggested a penalty on the second deriva-
tive of the fitted curve (see #3).
m(x) = argmin
n
i=1
yi − b(xi)T
β
2
+ λ
R
b (xi)T
β
@freakonometrics freakonometrics freakonometrics.hypotheses.org 60
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
0 0.25 0.5 0.75 1
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
0 0.25 0.5 0.75 1
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Adding Constraints: Convex Regression
Assume that yi = m(xi) + εi where m : Rd
→ ∞R is some convex function.
m is convex if and only if ∀x1, x2 ∈ Rd
, ∀t ∈ [0, 1],
m(tx1 + [1 − t]x2) ≤ tm(x1) + [1 − t]m(x2)
Proposition (Hidreth (1954) Point Estimates of Ordinates of Concave Functions)
m = argmin
m convex
n
i=1
yi − m(xi)
2
Then θ = (m (x1), · · · , m (xn)) is unique.
Let y = θ + ε, then
θ = argmin
θ∈K
n
i=1
yi − θi)
2
where K = {θ ∈ Rn
: ∃m convex , m(xi) = θi}. I.e. θ is the projection of y onto
the (closed) convex cone K. The projection theorem gives existence and unicity.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 61
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Adding Constraints: Convex Regression
In dimension 1: yi = m(xi) + εi. Assume that observations are ordered
x1 < x2 < · · · < xn.
Here
K = θ ∈ Rn
:
θ2 − θ1
x2 − x1
≤
θ3 − θ2
x3 − x2
≤ · · · ≤
θn − θn−1
xn − xn−1
Hence, quadratic program with n − 2 linear con-
straints.
m is a piecewise linear function (interpolation of
consecutive pairs (xi, θi )).
If m is differentiable, m is convex if
m(x) + m(x) · [y − x] ≤ m(y)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
@freakonometrics freakonometrics freakonometrics.hypotheses.org 62
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Adding Constraints: Convex Regression
More generally: if m is convex, then there exists ξx ∈ Rn
such that
m(x) + ξx · [y − x] ≤ m(y)
ξx is a subgradient of m at x. And then
∂m(x) = m(x) + ξ · [y − x] ≤ m(y), ∀y ∈ Rn
Hence, θ is solution of
argmin y − θ 2
subject to θi + ξi[xj − xi] ≤ θj, ∀i, j
and ξ1, · · · , ξn ∈ Rn
.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 63
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
speed
dist
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Spatial Smoothing
One can also consider some spatial smoothing, if we want to predict E[Y |X = x]
for some coordinate x.
1 > library(rgeos)
2 > library(rgdal)
3 > library(maptools)
4 > library( cartography )
5 > download.file("http://bit.ly/2G3KIUG","zonier.RData")
6 > load("zonier.RData")
7 > cols=rev(carto.pal(pal1="red.pal",n1=10, pal2="green.pal",n2 =10))
8 > download.file("http://bit.ly/2GSvzGW","FRA_adm0.rds")
9 > download.file("http://bit.ly/2FUZ0Lz","FRA_adm2.rds")
10 > FR=readRDS("FRA_adm2.rds")
11 > donnees_carte=data.frame(FRdata)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 64
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Spatial Smoothing
1 > FR0=readRDS("FRA_adm0.rds")
2 > plot(FR0)
3 > bk = seq (-5,4.5, length =21)
4 > cuty = cut(simbase$Y,breaks=bk ,labels
=1:20)
5 > points(simbase$long ,simbase$lat ,col=cols
[cuty],pch =19, cex =.5)
One can consider a choropleth map (spatial version of the histogram).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 65
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
1 > A=aggregate(x = simbase$Y,by=list(
simbase$dpt),mean)
2 > names(A)=c("dpt","y")
3 > d=donnees_carte$CCA_2
4 > d[d=="2A"]="201"
5 > d[d=="2B"]="202"
6 > donnees_carte$dpt=as.numeric(as.
character(d))
7 > donnees_carte=merge(donnees_carte ,A,all.
x=TRUE)
8 > donnees_carte=donnees_carte[order(
donnees_carte$OBJECTID) ,]
9 > bk=seq ( -2.75 ,2.75 , length =21)
10 > donnees_carte$cuty=cut(donnees_carte$y,
breaks=bk ,labels =1:20)
11 > plot(FR , col=cols[donnees_carte$cuty],
xlim=c( -5.2 ,12))
Spatial Smoothing
@freakonometrics freakonometrics freakonometrics.hypotheses.org 66
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Spatial Smoothing
Instead of a ”continuous” gradient of colors, one can consider only 4 colors (4
levels) for the prediction.
1 > bk=seq ( -2.75 ,2.75 , length =5)
2 > donnees_carte$cuty=cut(donnees_carte$y,
breaks=bk ,labels =1:4)
3 > plot(FR , col=cols[c(3 ,8 ,12 ,17)][ donnees_
carte$cuty],xlim=c( -5.2 ,12))
@freakonometrics freakonometrics freakonometrics.hypotheses.org 67
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Spatial Smoothing
1 > P1 = FR0@polygons [[1]] @Polygons [[355]]
@coords
2 > P2 = FR0@polygons [[1]] @Polygons [[27]]
@coords
3 > plot(FR0 ,border=NA)
4 > polygon(P1)
5 > polygon(P2)
6 > grille <-expand.grid(seq(min(simbase$long
),max(simbase$long),length =101) ,seq(min
(simbase$lat),max(simbase$lat),length
=101))
7 > paslong =(max(simbase$long)-min(simbase$
long))/100
8 > paslat =(max(simbase$lat)-min(simbase$lat
))/100
@freakonometrics freakonometrics freakonometrics.hypotheses.org 68
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Spatial Smoothing
We need to create a grid (i.e. X) on which we approximate E[Y |X = x]
1 > f=function(i){ (point.in.polygon (grille
[i, 1]+ paslong/2 , grille[i, 2]+ paslat/
2 , P1[,1],P1[ ,2]) >0)+( point.in.polygon
(grille[i, 1]+ paslong/2 , grille[i,
2]+ paslat/2 , P2[,1],P2[ ,2]) >0) }
2 > indic=unlist(lapply (1: nrow(grille),f))
3 > grille=grille[which(indic ==1) ,]
4 > points(grille [ ,1]+ paslong/2,grille [ ,2]+
paslat/2)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 69
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Spatial Smoothing
Consider here some k-NN, with k = 20
1 > library(geosphere)
2 > knn=function(i,k=20){
3 + d= distHaversine (grille[i,1:2] , simbase[,c
("long","lat")], r=6378.137)
4 + r=rank(d)
5 + ind=which(r<=k)
6 + mean(simbase[ind ,"Y"])
7 + }
8 > grille$y=Vectorize(knn)(1: nrow(grille))
9 > bk=seq ( -2.75 ,2.75 , length =21)
10 > grille$cuty=cut(grille$y,breaks=bk ,
labels =1:20)
11 > points(grille [ ,1]+ paslong/2,grille [ ,2]+
paslat/2,col=cols[grille$cuty],pch =19)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 70
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Spatial Smoothing
Again, instead of a ”continuous” gradient, we can use 4 levels,
1 > bk=seq ( -2.75 ,2.75 , length =5)
2 > grille$cuty=cut(grille$y,breaks=bk ,
labels =1:4)
3 > plot(FR0 ,border=NA)
4 > polygon(P1)
5 > polygon(P2)
6 > points(grille [ ,1]+ paslong/2,grille [ ,2]+
paslat/2,col=cols[c(3 ,8 ,12 ,17)][ grille$
cuty],pch =19)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 71
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Testing (Non-)Linearities
In the linear model,
y = Xβ = X[XT
X]−1
XT
H
y
Hi,i is the leverage of the ith element of this hat matrix.
Write
yi =
n
j=1
[XT
i [XT
X]−1
XT
]jyj =
n
j=1
[H(Xi)]jyj
where
H(x) = xT
[XT
X]−1
XT
The prediction is
m(x) = E(Y |X = x) =
n
j=1
[H(x)]jyj
@freakonometrics freakonometrics freakonometrics.hypotheses.org 72
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Testing (Non-)Linearities
More generally, a predictor m is said to be linear if for all x if there is
S(·) : Rn
→ Rn
such that
m(x) =
n
j=1
S(x)jyj
Conversely, given y1, · · · , yn, there is a matrix S n × n such that
y = Sy
For the linear model, S = H.
trace(H) = dim(β): degrees of freedom
Hi,i
1 − Hi,i
is related to Cook’s distance, from Cook (1977), Detection of Influential
Observations in Linear Regression.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 73
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Testing (Non-)Linearities
For a kernel regression model, with kernel k and bandwidth h
S
(k,h)
i,j =
kh(xi − xj)
n
k=1
kh(xk − xj)
where kh(·) = k(·/h), while S(k,h)
(x)j =
Kh(x − xj)
n
k=1
kh(x − xk)
For a k-nearest neighbor, S
(k)
i,j =
1
k
1(j ∈ Ixi ) where Ixi are the k nearest
observations to xi, while S(k)
(x)j =
1
k
1(j ∈ Ix).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 74
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Testing (Non-)Linearities
Observe that trace(S) is usually seen as a degree of smoothness.
Do we have to smooth? Isn’t linear model sufficent?
Define
T =
Sy − Hy
trace([S − H]T[S − H])
If the model is linear, then T has a Fisher distribution.
Remark: In the case of a linear predictor, with smoothing matrix Sh
R(h) =
1
n
n
i=1
(yi − m
(−i)
h (xi))2
=
1
n
n
i=1
Yi − mh(xi)
1 − [Sh]i,i
2
We do not need to estimate n models. One can also minimize
GCV (h) =
n2
n2 − trace(S)2
·
1
n
n
i=1
(Yi − mh(xi))
2
∼ Mallow’s Cp
@freakonometrics freakonometrics freakonometrics.hypotheses.org 75
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Confidence Intervals
If y = mh(x) = Sh(x)y, let σ2
=
1
n
n
i=1
(yi − mh(xi))
2
and a confidence interval
is, at x mh(y) ± t1−α/2σ Sh(x)Sh(x)T .
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
vitesse du véhicule
distancedefreinage
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
@freakonometrics freakonometrics freakonometrics.hypotheses.org 76
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Confidence Bands
0
50
100
150
5
10
15
20
25
dist
speed 0
50
100
150
5
10
15
20
25
dist
speed
To go further see functional confidence regions *
@freakonometrics freakonometrics freakonometrics.hypotheses.org 77
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Boosting to Capture NonLinear Effects
We want to solve
m = argmin E (Y − m(X))2
The heuristics is simple: we consider an iterative process where we keep modeling
the errors.
Fit model for y, h1(·) from y and X, and compute the error, ε1 = y − h1(X).
Fit model for ε1, h2(·) from ε1 and X, and compute the error, ε2 = ε1 − h2(X),
etc. Then set
mk(·) = h1(·)
∼y
+ h2(·)
∼ε1
+ h3(·)
∼ε2
+ · · · + hk(·)
∼εk−1
Hence, we consider an iterative procedure, mk(·) = mk−1(·) + hk(·).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 78
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Boosting
h(x) = y − mk(x), which can be interpreted as a residual. Note that this residual
is the gradient of
1
2
[y − mk(x)]2
A gradient descent is based on Taylor expansion
f(xk)
f,xk
∼ f(xk−1)
f,xk−1
+ (xk − xk−1)
α
f(xk−1)
f,xk−1
But here, it is different. We claim we can write
fk(x)
fk,x
∼ fk−1(x)
fk−1,x
+ (fk − fk−1)
β
?
fk−1, x
where ? is interpreted as a ‘gradient’.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 79
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Boosting
Construct iteratively
mk(·) = mk−1(·) + argmin
h∈H
n
i=1
(yi − [mk−1(xi) + h(xi)])2
mk(·) = mk−1(·) + argmin
h∈H
n
i=1
([yi − mk−1(xi)] − h(xi)])2
where h ∈ H means that we seek in a class of weak learner functions.
If learner are two strong, the first loop leads to some fixed point, and there is no
learning procedure, see linear regression y = xT
β + ε. Since ε ⊥ x we cannot
learn from the residuals.
In order to make sure that we learn weakly, we can use some shrinkage
parameter ν (or collection of parameters νj).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 80
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Boosting with Piecewise Linear Spline & Stump Functions
Instead of εk = εk−1 − hk(x), set εk = εk−1 − ν·hk(x)
1 for(t in 1:100){fit = rpart(yr˜x,data=df)
2 yp = predict(fit ,newdata=df)
3 df$yr = df$yr - v*yp)}
Remark : bumps are related to regression trees.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 81
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qqq
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 1 2 3 4 5 6
−1.5−1.0−0.50.00.51.01.5
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qqq
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 1 2 3 4 5 6
−1.5−1.0−0.50.00.51.01.5
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Ruptures
One can use Chow test to test for a rupture. Note that it is simply Fisher test,
with two parts,
β =



β1 for i = 1, · · · , i0
β2 for i = i0 + 1, · · · , n
and test



H0 : β1 = β2
H1 : β1 = β2
i0 is a point between k and n − k (we need enough observations). Chow (1960)
Tests of Equality Between Sets of Coefficients in Two Linear Regressions suggested
Fi0
=
ηT
η − εT
ε
εT
ε/(n − 2k)
where εi = yi − xT
i β, and ηi =



Yi − xT
i β1 for i = k, · · · , i0
Yi − xT
i β2 for i = i0 + 1, · · · , n − k
@freakonometrics freakonometrics freakonometrics.hypotheses.org 82
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Ruptures
1 > Fstats(dist ˜ speed ,data=cars ,from =7/50)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
Vitesse du véhicule
Distancedefeinage
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
Indice
Fstatistics
0 10 20 30 40 50
024681012
q
@freakonometrics freakonometrics freakonometrics.hypotheses.org 83
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Ruptures
1 > Fstats(dist ˜ speed ,data=cars ,from =2/50)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20 25
020406080100120
Vitesse du véhicule
Distancedefeinage
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
Indice
Fstatistics
0 10 20 30 40 50
024681012
q
@freakonometrics freakonometrics freakonometrics.hypotheses.org 84
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Ruptures
If i0 is unknown, use CUSUM types of tests, see Ploberger & Kr¨amer (1992) The
Cusum Test with OLS Residuals. For all t ∈ [0, 1], set
Wt =
1
σ
√
n
nt
i=1
εi.
If α is the confidence level, bounds are generally ±α, even if theoretical bounds
should be ±α t(1 − t).
1 > cusum <- efp(dist ˜ speed , type = "OLS -CUSUM",data=cars)
2 > plot(cusum ,ylim=c(-2,2))
3 > plot(cusum , alpha = 0.05 , alt.boundary = TRUE ,ylim=c(-2,2))
@freakonometrics freakonometrics freakonometrics.hypotheses.org 85
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Ruptures
OLS−based CUSUM test
Time
Empiricalfluctuationprocess
0.0 0.2 0.4 0.6 0.8 1.0
−2−1012
OLS−based CUSUM test with alternative boundaries
Time
Empiricalfluctuationprocess
0.0 0.2 0.4 0.6 0.8 1.0
−2−1012
@freakonometrics freakonometrics freakonometrics.hypotheses.org 86
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
From a Rupture to a Discontinuity
See Imbens & Lemieux (2008) Regression Discontinuity Designs.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 87
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
From a Rupture to a Discontinuity
Consider the dataset from Lee (2008) Randomized experiments from non-random
selection in U.S. House elections.
1 > library(RDDtools)
2 > data(Lee2008)
We want to test if there is a discontinuity in
0.
• with parametric tools
• with nonparametric tools
@freakonometrics freakonometrics freakonometrics.hypotheses.org 88
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q qq
q
q
qq
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q qq
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq qq
q q
q
q
q q
qqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
q q
q
q
q
q q
q
q
q
q
q
q
q
qq
q q
q
q q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
qqq
q
q q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
qq
q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
qq
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q q
q
q
qq
q q
q
q
q
q
qq
qqq
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
qq
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
q
qq
q
q
q
q
q
q
q
q q
qqq
q
q
q
q
q
q
qqq
q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
qq
q
q q
q
q
q q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqq
q
q q
q
qq
q
q
q
q
q
q
q qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
q q
q
q
q
q qq
q
q
q
q
q
qq
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
qq
q
qqq q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
qq
q
q
q
q
qq
q
qq
q q
q
q
q
q
q
q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
qq
q
qq
q
q
q
q q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q q
q
q
q
q
qq
q
q
q
q
q
q
q
q q
q
q
q
qq q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
qq
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q q
q q
q q
q
qqq qqqq qq q qqq
q
q
q qq qqq q qqq qqq
q
q
qqqqqqq qqq qqq
q
q
qqq qq qqq q q qqq
q
q
qqqqqqqqq q qqq
q
q
qq qq qqqqq q qqq
q
q
qqqqqq qqq q qqq
q
q
qqqq qq q qq q qqq
q
q
q qqq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
qq
q
qqq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
qq
q q
q
q q
q
q
q
qqq
q
q
q
q
q q
q
q
q
q
q
q
qqq
q
q
q
q q
q q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
qqqq
q
q
q
q
q
q q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
q
q
qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
q
q
q
q
q
q
q
q
q
q qqq
q
q
q
q
q
q
q
q
q
q q
qq qq
q
q
q q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q
q
qq
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
qq
q
q
q q
q
q
q
q
q
q qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q qqq
q
q
q
q
qqqqqq
q
qq
q
q
q
q
q
q
qqqq
q
q q
q
q qq q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q qqqqq
q
q q
q
q
qq q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q q
q
q
q
q
q
q
q
q
qqqqq q
q
q
q
q
q
q
q
q
q
qqqq
q
q qq q
q
q
q
q
q
q
q
q
qq q
q
qq
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q qq
q
q
q q
q
q q
q
q
q
q
q
q
q
qqqq
qq
q
q
q
q
q
qqqqqqq
q
q q
q
q
q
q
q
qqq
q
q
q
q
q
q q
q
q
q
q
q
qqqqq
q
q
q
qq
q
q
q q
qqqqq qq
q
qq
q
q
q
q
qq
q
q
q qq
q
q
q
q
q
qq
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q qq
q
q
q qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q qqqqq
q
q
qq
q
q
qq
qq qq
q
q
q
qq
q
q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q q
q
q
q
q qqqq
q
q
q
q
q
q
q
q
q
q
q qqq
q
q
q
q
q
q
q q
q
q
q
qqqqq q
q
q
q
q q
q
q
q
q
q qq
q
q qq
q
qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qqqqq qq
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
qq
q
q
q
q
q
q
qqqq qq
qq
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq q q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
qq
q
q
q
q
q
q
q qqq
q
q
q
q
q
q q
q
q
q
q
q
qqqq qq
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq qq
q
q
q
q
q
q
q
q
qqqqq q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q qqq
q
q qqq
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q qq
q
q
q
q
q
q
q
q
q qqq
q
q
q qq
q
q
q
q
q q
qqqqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qqq q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q qqq
q
q q
q
q
q
q
q
q
q
q q
q
q q
q
qq
q
q
q
q
q
q
qqq
qqq
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
qqq
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
qq
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
−1.0 −0.5 0.0 0.5 1.0
0.00.20.40.60.81.0
x
y
Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics
Testing for a rupture
Use some 4th order polynomial, on each part
1 > idx1 = (Lee2008$x >0)
2 > reg1 = lm(y˜poly(x ,4) ,data=Lee2008[
idx1 ,])
3 > idx2 = (Lee2008$x <0)
4 > reg2 = lm(y˜poly(x ,4) ,data=Lee2008[
idx2 ,])
5 > s1=predict(reg1 ,newdata=data.frame(x
=0))
6 > s2=predict(reg2 ,newdata=data.frame(x
=0))
7 > abs(s1 -s2)
8 1
9 0.07659014
@freakonometrics freakonometrics freakonometrics.hypotheses.org 89
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q qq
q
q
qq
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q qq
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq qq
q q
q
q
q q
qqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
q q
q
q
q
q q
q
q
q
q
q
q
q
qq
q q
q
q q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
qqq
q
q q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
qq
q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
qq
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q q
q
q
qq
q q
q
q
q
q
qq
qqq
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
qq
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
q
qq
q
q
q
q
q
q
q
q q
qqq
q
q
q
q
q
q
qqq
q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
qq
q
q q
q
q
q q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqq
q
q q
q
qq
q
q
q
q
q
q
q qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
q q
q
q
q
q qq
q
q
q
q
q
qq
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
qq
q
qqq q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
qq
q
q
q
q
qq
q
qq
q q
q
q
q
q
q
q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
qq
q
qq
q
q
q
q q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q q
q
q
q
q
qq
q
q
q
q
q
q
q
q q
q
q
q
qq q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
qq
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q q
q q
q q
q
qqq qqqq qq q qqq
q
q
q qq qqq q qqq qqq
q
q
qqqqqqq qqq qqq
q
q
qqq qq qqq q q qqq
q
q
qqqqqqqqq q qqq
q
q
qq qq qqqqq q qqq
q
q
qqqqqq qqq q qqq
q
q
qqqq qq q qq q qqq
q
q
q qqq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
qq
q
qqq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
qq
q q
q
q q
q
q
q
qqq
q
q
q
q
q q
q
q
q
q
q
q
qqq
q
q
q
q q
q q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
qqqq
q
q
q
q
q
q q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
q
q
qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
q
q
q
q
q
q
q
q
q
q qqq
q
q
q
q
q
q
q
q
q
q q
qq qq
q
q
q q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q
q
qq
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
qq
q
q
q q
q
q
q
q
q
q qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q qqq
q
q
q
q
qqqqqq
q
qq
q
q
q
q
q
q
qqqq
q
q q
q
q qq q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q qqqqq
q
q q
q
q
qq q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q q
q
q
q
q
q
q
q
q
qqqqq q
q
q
q
q
q
q
q
q
q
qqqq
q
q qq q
q
q
q
q
q
q
q
q
qq q
q
qq
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q qq
q
q
q q
q
q q
q
q
q
q
q
q
q
qqqq
qq
q
q
q
q
q
qqqqqqq
q
q q
q
q
q
q
q
qqq
q
q
q
q
q
q q
q
q
q
q
q
qqqqq
q
q
q
qq
q
q
q q
qqqqq qq
q
qq
q
q
q
q
qq
q
q
q qq
q
q
q
q
q
qq
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q qq
q
q
q qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q qqqqq
q
q
qq
q
q
qq
qq qq
q
q
q
qq
q
q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q q
q
q
q
q qqqq
q
q
q
q
q
q
q
q
q
q
q qqq
q
q
q
q
q
q
q q
q
q
q
qqqqq q
q
q
q
q q
q
q
q
q
q qq
q
q qq
q
qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qqqqq qq
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
qq
q
q
q
q
q
q
qqqq qq
qq
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq q q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
qq
q
q
q
q
q
q
q qqq
q
q
q
q
q
q q
q
q
q
q
q
qqqq qq
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq qq
q
q
q
q
q
q
q
q
qqqqq q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q qqq
q
q qqq
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q qq
q
q
q
q
q
q
q
q
q qqq
q
q
q qq
q
q
q
q
q q
qqqqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qqq q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q qqq
q
q q
q
q
q
q
q
q
q
q q
q
q q
q
qq
q
q
q
q
q
q
qqq
qqq
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
qqq
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
qq
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
−1.0 −0.5 0.0 0.5 1.0
0.00.20.40.60.81.0
x
y
Slides ub-4
Slides ub-4
Slides ub-4

Contenu connexe

Plus de Arthur Charpentier

Plus de Arthur Charpentier (20)

Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Side 2019 #8
Side 2019 #8Side 2019 #8
Side 2019 #8
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
 
Side 2019 #6
Side 2019 #6Side 2019 #6
Side 2019 #6
 
Side 2019 #5
Side 2019 #5Side 2019 #5
Side 2019 #5
 
Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
 
Side 2019 #3
Side 2019 #3Side 2019 #3
Side 2019 #3
 
Side 2019, part 2
Side 2019, part 2Side 2019, part 2
Side 2019, part 2
 
Side 2019, part 1
Side 2019, part 1Side 2019, part 1
Side 2019, part 1
 
Pareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQPareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQ
 

Dernier

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 

Dernier (20)

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 

Slides ub-4

  • 1. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Big Data for Economics # 4* A. Charpentier (Universit´e de Rennes 1) https://github.com/freakonometrics/ub UB School of Economics Summer School, 2018. @freakonometrics freakonometrics freakonometrics.hypotheses.org 1
  • 2. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics #4 Nonlinearities and Discontinuities* @freakonometrics freakonometrics freakonometrics.hypotheses.org 2
  • 3. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics References Motivation Kopczuk, W. Tax bases, tax rates and the elasticity of reported income. JPE. References Eubank, R.L. (1999) Nonparametric Regression and Spline Smoothing, CRC Press. Fan, J. & Gijbels, I. (1996) Local Polynomial Modelling and Its Applications CRC Press. Hastie, T.J. & Tibshirani, R.J. (1990) Generalized Additive Models. CRC Press Wand, M.P & Jones, M.C. (1994) Kernel Smoothing. CRC Press @freakonometrics freakonometrics freakonometrics.hypotheses.org 3
  • 4. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Deterministic or Parametric Transformations Consider child mortality rate (y) as a function of GDP per capita (x). q q q q q q q q qq q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 050100150 PIB par tête Tauxdemortalitéinfantile Afghanistan Albania Algeria American.Samoa Angola Argentina Armenia Austria Azerbaijan Bahamas Bangladesh Belarus Belgium Belize Benin BhutanBolivia Bosnia.and.Herzegovina Brunei.Darussalam Bulgaria Burkina.Faso Cambodia Canada Cape.Verde Central.African.Republic Chad Channel.Islands Chile China Comoros Congo Cook.Islands Côte.dIvoire Cuba CyprusCzech.Republic Korea Democratic.Republic.of.the.Congo Denmark Djibouti Egypt El.Salvador Equatorial.Guinea Estonia Fiji FinlandFrance French.Guiana French.Polynesia Gabon Gambia Ghana Gibraltar Greece Grenada Guam Guatemala Guinea Guinea−Bissau Guyana Haiti Honduras India Indonesia Iran IrelandIsrael Italy Jamaica Japan Jordan Kazakhstan Kenya Kyrgyzstan Laos Latvia Lesotho Libyan.Arab.Jamahiriya Liechtenstein Lithuania Luxembourg Madagascar Malawi Malaysia Malta Marshall.Islands Martinique Mauritius Micronesia.(Federated.States.of) Mongolia Montenegro Morocco Mozambique Myanmar Namibia Netherlands Netherlands.Antilles New.Caledonia Nicaragua Nigeria Niue Norway Occupied.Palestinian.Territory Oman Pakistan Papua.New.Guinea Paraguay Peru Poland Puerto.Rico Qatar Republic.of.Korea Republic.of.Moldova RéunionRomaniaRussian.Federation Saint.Vincent.and.the.GrenadinesSamoa San.Marino Saudi.Arabia Serbia Sierra.Leone Singapore Slovakia Slovenia Solomon.Islands Somalia Sri.Lanka Sudan Suriname Sweden Syrian.Arab.Republic Tajikistan Thailand Macedonia Timor−Leste Togo Tunisia Turkey Turkmenistan Tuvalu Ukraine United.Arab.Emirates United.Kingdom United.Republic.of.Tanzania United.States.of.America United.States.Virgin.Islands Uruguay Venezuela Viet.Nam YemenZimbabwe @freakonometrics freakonometrics freakonometrics.hypotheses.org 4
  • 5. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Deterministic or Parametric Transformations Logarithmic transformation, log(y) as a function of log(x) q q q q q q q q qq q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 1e+02 5e+02 1e+03 5e+03 1e+04 5e+04 1e+05 25102050100 PIB par tête (log) Tauxdemortalité(log) Afghanistan Albania Algeria American.Samoa Angola Argentina Armenia Aruba AustraliaAustria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin BhutanBolivia Bosnia.and.Herzegovina Botswana Brazil Brunei.Darussalam Bulgaria Burkina.FasoBurundi Cambodia Cameroon Canada Cape.Verde Central.African.Republic Chad Channel.Islands Chile China Hong.Kong Colombia Comoros Congo Cook.Islands Costa.Rica Côte.dIvoire Croatia Cuba Cyprus Czech.Republic Korea Democratic.Republic.of.the.Congo Denmark Djibouti Dominican.Republic Ecuador Egypt El.Salvador Equatorial.Guinea Eritrea Estonia Ethiopia Fiji FinlandFrance French.Guiana French.Polynesia Gabon Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guadeloupe Guam Guatemala Guinea Guinea−Bissau Guyana Haiti Honduras Hungary Iceland India Indonesia Iran Iraq Ireland Isle.of.Man Israel Italy Jamaica Japan Jordan Kazakhstan Kenya Kiribati Kuwait KyrgyzstanLaos Latvia Lebanon Lesotho Liberia Libyan.Arab.Jamahiriya Liechtenstein Lithuania Luxembourg Madagascar Malawi Malaysia Maldives Mali Malta Marshall.Islands Martinique Mauritania Mauritius Mexico Micronesia.(Federated.States.of) Mongolia Montenegro Morocco Mozambique Myanmar Namibia Nepal Netherlands Netherlands.Antilles New.Caledonia New.Zealand Nicaragua Niger Nigeria Niue Norway Occupied.Palestinian.Territory Oman Pakistan Palau Panama Papua.New.Guinea Paraguay Peru Philippines Poland Portugal Puerto.Rico Qatar Republic.of.Korea Republic.of.Moldova Réunion Romania Russian.Federation Rwanda Saint.Lucia Saint.Vincent.and.the.GrenadinesSamoa San.Marino Sao.Tome.and.Principe Saudi.Arabia Senegal Serbia Sierra.Leone Singapore Slovakia Slovenia Solomon.Islands Somalia South.Africa Spain Sri.Lanka Sudan Suriname Swaziland Sweden Switzerland Syrian.Arab.Republic Tajikistan Thailand Macedonia Timor−Leste Togo Tonga Trinidad.and.Tobago Tunisia Turkey Turkmenistan Turks.and.Caicos.Islands Tuvalu Uganda Ukraine United.Arab.Emirates United.Kingdom United.Republic.of.Tanzania United.States.of.America United.States.Virgin.Islands Uruguay Uzbekistan Vanuatu Venezuela Viet.Nam Western.Sahara Yemen Zambia Zimbabwe @freakonometrics freakonometrics freakonometrics.hypotheses.org 5
  • 6. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Deterministic or Parametric Transformations Reverse transformation q q q q q q q q qq q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 050100150 PIB par tête Tauxdemortalité Afghanistan Albania Algeria American.Samoa Angola Argentina Armenia Aruba AustraliaAustria Azerbaijan Bahamas Bahrain Bangladesh BarbadosBelarus Belgium Belize Benin BhutanBolivia Bosnia.and.Herzegovina Botswana Brazil Brunei.Darussalam Bulgaria Burkina.Faso Burundi Cambodia Cameroon Canada Cape.Verde Central.African.Republic Chad Channel.Islands Chile China Hong.Kong Colombia Comoros Congo Cook.IslandsCosta.Rica Côte.dIvoire CroatiaCuba CyprusCzech.Republic Korea Democratic.Republic.of.the.Congo Denmark Djibouti Dominican.Republic Ecuador Egypt El.Salvador Equatorial.Guinea Eritrea Estonia Ethiopia Fiji FinlandFrance French.Guiana French.Polynesia Gabon Gambia Georgia Germany Ghana Gibraltar GreeceGreenland Grenada GuadeloupeGuam Guatemala Guinea Guinea−Bissau Guyana Haiti Honduras Hungary Iceland India Indonesia Iran Iraq Ireland Isle.of.Man Israel Italy Jamaica Japan Jordan Kazakhstan Kenya Kiribati Kuwait Kyrgyzstan Laos Latvia Lebanon Lesotho Liberia Libyan.Arab.Jamahiriya Liechtenstein Lithuania Luxembourg Madagascar Malawi Malaysia Maldives Mali Malta Marshall.Islands Martinique Mauritania Mauritius Mexico Micronesia.(Federated.States.of) Mongolia Montenegro Morocco Mozambique Myanmar Namibia Nepal Netherlands Netherlands.Antilles New.CaledoniaNew.Zealand Nicaragua NigerNigeria Niue Norway Occupied.Palestinian.Territory Oman Pakistan Palau Panama Papua.New.Guinea Paraguay Peru Philippines Poland PortugalPuerto.Rico Qatar Republic.of.Korea Republic.of.Moldova RéunionRomaniaRussian.Federation Rwanda Saint.Lucia Saint.Vincent.and.the.GrenadinesSamoa San.Marino Sao.Tome.and.Principe Saudi.Arabia Senegal Serbia Sierra.Leone Singapore Slovakia Slovenia Solomon.Islands Somalia South.Africa Spain Sri.Lanka Sudan Suriname Swaziland Sweden Switzerland Syrian.Arab.Republic Tajikistan Thailand Macedonia Timor−Leste Togo Tonga Trinidad.and.Tobago Tunisia Turkey Turkmenistan Turks.and.Caicos.Islands Tuvalu Uganda Ukraine United.Arab.Emirates United.Kingdom United.Republic.of.Tanzania United.States.of.America United.States.Virgin.Islands Uruguay Uzbekistan Vanuatu Venezuela Viet.Nam Western.Sahara Yemen Zambia Zimbabwe @freakonometrics freakonometrics freakonometrics.hypotheses.org 6
  • 7. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Box-Cox transformation See Box & Cox (1964) An Analysis of Transformations , h(y, λ) =    yλ − 1 λ if λ = 0 log(y) if λ = 0 or h(y, λ, µ) =    [y + µ]λ − 1 λ if λ = 0 log([y + µ]) if λ = 0 @freakonometrics freakonometrics freakonometrics.hypotheses.org 7 0 1 2 3 4 −4−3−2−1012 −1 −0.5 0 0.5 1 1.5 2
  • 8. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Profile Likelihood In a statistical context, suppose that unknown parameter can be partitioned θ = (λ, β) where λ is the parameter of interest, and β is a nuisance parameter. Consider {y1, · · · , yn}, a sample from distribution Fθ, so that the log-likelihood is log L(θ) = n i=1 log fθ(yi) θ MLE is defined as θ MLE = argmax {log L(θ)} Rewrite the log-likelihood as log L(θ) = log Lλ(β). Define β pMLE λ = argmax β {log Lλ(β)} and then λpMLE = argmax λ log Lλ(β pMLE λ ) . Observe that √ n(λpMLE − λ) L −→ N(0, [Iλ,λ − Iλ,βI−1 β,βIβ,λ]−1 ) @freakonometrics freakonometrics freakonometrics.hypotheses.org 8
  • 9. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Profile Likelihood and Likelihood Ratio Test The (profile) likelihood ratio test is based on 2 max L(λ, β) − max L(λ0, β) If (λ0, β0) are the true value, this difference can be written 2 max L(λ, β) − max L(λ0, β0) − 2 max L(λ0, β) − max L(λ0, β0) Using Taylor’s expension ∂L(λ, β) ∂λ (λ0,βλ0 ) ∼ ∂L(λ, β) ∂λ (λ0,β0 ) − Iβ0λ0 I−1 β0β0 ∂L(λ0, β) ∂β (λ0,β0 ) Thus, 1 √ n ∂L(λ, β) ∂λ (λ0,βλ0 ) L → N(0, Iλ0λ0 ) − Iλ0β0 I−1 β0β0 Iβ0λ0 and 2 L(λ, β) − L(λ0, βλ0 ) L → χ2 (dim(λ)). @freakonometrics freakonometrics freakonometrics.hypotheses.org 9
  • 10. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Profile Likelihood and Likelihood Ratio Test Consider some lognormal sample, and fit a Gamma distribution, f(x; α, β) = xα−1 βα e−βx Γ(α) with x > 0 and θ = (α, β). 1 > x=exp(rnorm (100)) Maximum-likelihood, θ = argmax{log L(θ)}. 1 > library(MASS) 2 > (F=fitdistr(x,"gamma")) 3 shape rate 4 1.4214497 0.8619969 5 (0.1822570) (0.1320717) 6 > F$estimate [1]+c(-1,1)*1.96*F$sd [1] 7 [1] 1.064226 1.778673 @freakonometrics freakonometrics freakonometrics.hypotheses.org 10
  • 11. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Profile Likelihood and Likelihood Ratio Test See also 1 > log_lik=function(theta){ 2 + a=theta [1] 3 + b=theta [2] 4 + logL=sum(log(dgamma(x,a,b))) 5 + return(-logL) 6 + } 7 > optim(c(1 ,1),log_lik) 8 $par 9 [1] 1.4214116 0.8620311 We can also use profile likelihood, α = argmax max β log L(α, β) = argmax log L(α, βα) @freakonometrics freakonometrics freakonometrics.hypotheses.org 11
  • 12. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Profile Likelihood and Likelihood Ratio Test 1 > prof_log_lik=function(a){ 2 + b=( optim (1, function(z) -sum(log(dgamma(x,a,z)))))$par 3 + return(-sum(log(dgamma(x,a,b)))) 4 + } 5 6 > vx=seq (.5,3, length =101) 7 > vl=-Vectorize(prof_log_lik)(vx) 8 > plot(vx ,vl ,type="l") 9 > optim (1,prof_log_lik) 10 $par 11 [1] 1.421094 We can use the likelihood ratio test 2 log Lp(α) − log Lp(α) ∼ χ2 (1) @freakonometrics freakonometrics freakonometrics.hypotheses.org 12
  • 13. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Profile Likelihood and Likelihood Ratio Test The implied 95% confidence interval is 1 > (b1=uniroot(function(z) Vectorize(prof_log_lik)(z)+borne ,c(.5 ,1.5)) $root) 2 [1] 1.095726 3 > (b2=uniroot(function(z) Vectorize(prof_log_lik)(z)+borne ,c (1.25 ,2.5))$root) 4 [1] 1.811809 @freakonometrics freakonometrics freakonometrics.hypotheses.org 13
  • 14. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Box-Cox 1 > boxcox(lm(dist˜speed ,data=cars)) Here h∗ ∼ 0.5 Heuristally, y 1/2 i ∼ β0 + β1xi + εi why not consider a quadratic regression...? −0.5 0.0 0.5 1.0 1.5 2.0 −90−80−70−60−50 λ log−Likelihood 95% @freakonometrics freakonometrics freakonometrics.hypotheses.org 14 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist
  • 15. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Uncertainty: Parameters vs. Prediction Uncertainty on regression parameters (β0, β1) From the output of the regression we can derive confidence intervals for β0 and β1, usually βk ∈ βk ± u1−α/2se[βk] q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 Vitesse du véhicule Distancedefreinage q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 Vitesse du véhicule Distancedefreinage @freakonometrics freakonometrics freakonometrics.hypotheses.org 15
  • 16. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Uncertainty: Parameters vs. Prediction Uncertainty on a prediction, y = m(x). Usually m(x) ∈ m(x) ± u1−α/2se[m(x)] hence, for a linear model xT β ± u1−α/2σ xT[XT X]−1x i.e. (with one covariate) se2 [m(x)]2 = Var[β0 + β1x] se2 [β0] + cov[β0, β1]x + se2 [β1]x2 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 Vitesse du véhicule Distancedefreinage 1 > predict(lm(dist˜speed ,data=cars),newdata=data.frame(speed=x), interval="confidence") @freakonometrics freakonometrics freakonometrics.hypotheses.org 16
  • 17. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Least Squares and Expected Value (Orthogonal Projection Theorem) Let y ∈ Rd , y = argmin m∈R    n i=1 1 n yi − m εi 2    . It is the empirical version of E[Y ] = argmin m∈R    y − m ε 2 dF(y)    = argmin m∈R    E (Y − m ε )2    where Y is a 1 random variable. Thus, argmin m(·):Rk→R    n i=1 1 n yi − m(xi) εi 2    is the empirical version of E[Y |X = x]. @freakonometrics freakonometrics freakonometrics.hypotheses.org 17
  • 18. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics The Histogram and the Regressogram Connections between the estimation of f(y) and E[Y |X = x]. Assume that yi ∈ [a1, ak+1), divided in k classes [aj, aj+1). The histogram is ˆfa(y) = k j=1 1(t ∈ [aj, aj+1)) aj+1 − aj 1 n n i=1 1(yi ∈ [aj, aj+1)) Assume that aj+1 −aj = hn and hn → 0 as n → ∞ with nhn → ∞ then E ( ˆfa(y) − f(y))2 ∼ O(n−2/3 ) (for an optimal choice of hn). 1 > hist(height) @freakonometrics freakonometrics freakonometrics.hypotheses.org 18 150 160 170 180 190 0.000.010.020.030.040.050.06
  • 19. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics The Histogram and the Regressogram Then a moving histogram was considered, ˆf(y) = 1 2nhn n i=1 1(yi ∈ [y ± hn)) = 1 nhn n i=1 k yi − y hn with k(x) = 1 2 1(x ∈ [−1, 1)), which a (flat) kernel estimator. 1 > density(height , kernel = " rectangular ") 150 160 170 180 190 200 0.000.010.020.030.04 150 160 170 180 190 200 0.000.010.020.030.04 @freakonometrics freakonometrics freakonometrics.hypotheses.org 19
  • 20. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics The Histogram and the Regressogram From Tukey (1961) Curves as parameters, and touch estimation, the regressogram is defined as ˆma(x) = n i=1 1(xi ∈ [aj, aj+1))yi n i=1 1(xi ∈ [aj, aj+1)) and the moving regressogram is ˆm(x) = n i=1 1(xi ∈ [x ± hn])yi n i=1 1(xi ∈ [x ± hn]) @freakonometrics freakonometrics freakonometrics.hypotheses.org 20 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist
  • 21. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Nadaraya-Watson and Kernels Background: Kernel Density Estimator Consider sample {y1, · · · , yn}, Fn empirical cumulative distribution function Fn(y) = 1 n n i=1 1(yi ≤ y) The empirical measure Pn consists in weights 1/n on each observation. Idea: add (little) continuous noise to smooth Fn. Let Yn denote a random variable with distribution Fn and define ˜Y = Yn + hU where U ⊥⊥ Yn, with cdf K The cumulative distribution function of ˜Y is ˜F ˜F(y) = P[ ˜Y ≤ y] = E 1( ˜Y ≤ y) = E E 1( ˜Y ≤ y) Yn ˜F(y) = E 1 U ≤ y − Yn h Yn = n i=1 1 n K y − yi h @freakonometrics freakonometrics freakonometrics.hypotheses.org 21
  • 22. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Nadaraya-Watson and Kernels If we differentiate ˜f(y)= 1 nh n i=1 k y − yi h = 1 n n i=1 kh (y − yi) with kh(u) = 1 h k u h ˜f is the kernel density estimator of f, with kernel k and bandwidth h. Rectangular, k(u) = 1 2 1(|u| ≤ 1) Epanechnikov, k(u) = 3 4 1(|u| ≤ 1)(1 − u2 ) Gaussian, k(u) = 1 √ 2π e− u2 2 1 > density(height , kernel = " epanechnikov ") −2 −1 0 1 2 @freakonometrics freakonometrics freakonometrics.hypotheses.org 22 150 160 170 180 190 200 0.000.010.020.030.04
  • 23. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Kernels and Statistical Properties Consider here an i.id. sample {Y1, · · · , Yn} with density f Given y, observe that E[ ˜f(y)] = 1 h k y − t h f(t)dt = k(u)f(y − hu)du. Use Taylor expansion around h = 0,f(y − hu) ∼ f(y) − f (y)hu + 1 2 f (y)h2 u2 E[ ˜f(y)] = f(y)k(u)du − f (y)huk(u)du + 1 2 f (y + hu)h2 u2 k(u)du = f(y) + 0 + h2 f (y) 2 k(u)u2 du + o(h2 ) Thus, if f is twice continuously differentiable with bounded second derivative, k(u)du = 1, uk(u)du = 0 and u2 k(u)du < ∞, then E[ ˜f(y)] = f(y) + h2 f (y) 2 k(u)u2 du + o(h2 ) @freakonometrics freakonometrics freakonometrics.hypotheses.org 23
  • 24. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Kernels and Statistical Properties For the heuristics on that bias, consider a flat kernel, and set fh(y) = F(y + h) − F(y − h) 2h then the natural estimate is fh(y) = F(y + h) − F(y − h) 2h = 1 2nh n i=1 1(yi ∈ [y ± h]) Zi where Zi’s are Bernoulli B(px) i.id. variables with px = P[Yi ∈ [x ± h]] = 2h · fh(x). Thus, E(fh(y)) = fh(y), while fh(y) ∼ f(y) + h2 6 f (y) as h ∼ 0. @freakonometrics freakonometrics freakonometrics.hypotheses.org 24
  • 25. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Kernels and Statistical Properties Similarly, as h → 0 and nh → ∞ Var[ ˜f(y)] = 1 n E[kh(z − Z)2 ] − (E[kh(z − Z)]) 2 Var[ ˜f(y)] = f(y) nh k(u)2 du + o 1 nh Hence • if h → 0 the bias goes to 0 • if nh → ∞ the variance goes to 0 @freakonometrics freakonometrics freakonometrics.hypotheses.org 25
  • 26. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Kernels and Statistical Properties Extension in Higher Dimension: ˜f(y) = 1 n|H|1/2 n i=1 k H−1/2 (y − yi) ˜f(y) = 1 nhd|Σ|1/2 n i=1 k Σ−1/2 (y − yi) h @freakonometrics freakonometrics freakonometrics.hypotheses.org 26 150 160 170 180 190 200 406080100120 height weight 2e−04 4e−04 6e−04 8e−04 0.001 0.0012 0.0014 0.0016 0.0018
  • 27. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Kernels and Convolution Given f and g, set (f g)(x) = R f(x − y)g(y)dy Then ˜fh = ( ˆf kh), where ˆf(y) = ˆF(y) dy = n i=1 δyi (y) Hence, ˜f is the distribution of Y + ε where Y is uniform over {y1, · · · , yn} and ε ∼ kh are independent @freakonometrics freakonometrics freakonometrics.hypotheses.org 27 q q q q q −0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.00.20.40.60.81.0
  • 28. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Nadaraya-Watson and Kernels Here E[Y |X = x] = m(x). Write m as a function of densities m(x) = yf(y|x)dy = yf(y, x)dy f(y, x)dy Consider some bivariate kernel k, such that tk(t, u)dt = 0 and κ(u) = k(t, u)dt For the numerator, it can be estimated using y ˜f(y, x)dy = 1 nh2 n i=1 yk y − yi h , x − xi h = 1 nh n i=1 yik t, x − xi h dt = 1 nh n i=1 yiκ x − xi h @freakonometrics freakonometrics freakonometrics.hypotheses.org 28
  • 29. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Nadaraya-Watson and Kernels and for the denominator f(y, x)dy = 1 nh2 n i=1 k y − yi h , x − xi h = 1 nh n i=1 κ x − xi h Therefore, plugging in the expression for g(x) yields ˜m(x) = n i=1 yiκh (x − xi) n i=1 κh (x − xi) Observe that this regression estimator is a weighted average (see linear predictor section) ˜m(x) = n i=1 ωi(x)yi with ωi(x) = κh (x − xi) n i=1 κh (x − xi) @freakonometrics freakonometrics freakonometrics.hypotheses.org 29 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist q q q q q qq q q q q q q q q q q q q q q q q q q
  • 30. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Nadaraya-Watson and Kernels One can prove that kernel regression bias is given by E[ ˜m(x)] ∼ m(x) + h2 C1 2 m (x) + C2m (x) f (x) f(x) while Var[ ˜m(x)] ∼ C3 nh σ(x) f(x) . In this univariate case, one can easily get the kernel estimator of derivatives. Actually, ˜m is a function of bandwidth h. Note: this can be extended to multivariate x. @freakonometrics freakonometrics freakonometrics.hypotheses.org 30 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist q
  • 31. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Nadaraya-Watson and Kernels in Higher Dimension Here mH(x) = n i=1 yikH(xi − x) n i=1 kH(xi − x) for some symmetric positive definite bandwidth matrix H, and kH(x) = det[H]−1 k(H−1 x). Then E[mH(x)] ∼ m(x) + C1 2 trace HT m (x)H + C2 m (x)T HHT f(x) f(x) while Var[mH(x)] ∼ C3 ndet(H) σ(x) f(x) Hence, if H = hI, h ∼ Cn− 1 4+dim(x) . @freakonometrics freakonometrics freakonometrics.hypotheses.org 31
  • 32. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics From kernels to k-nearest neighbours An alternative is to consider ˜mk(x) = 1 n n i=1 ωi,k(x)yi where ωi,k(x) = n k if i ∈ Ik x with Ik x = {i : xi one of the k nearest observations to x} Lai (1977) Large sample properties of K-nearest neighbor procedures if k → ∞ and k/n → 0 as n → ∞, then E[ ˜mk(x)] ∼ m(x) + 1 24f(x)3 (m f + 2m f )(x) k n 2 while Var[ ˜mk(x)] ∼ σ2 (x) k @freakonometrics freakonometrics freakonometrics.hypotheses.org 32 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist
  • 33. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics From kernels to k-nearest neighbours Remark: Brent & John (1985) Finding the median requires 2n comparisons considered some median smoothing algorithm, where we consider the median over the k nearest neighbours (see section #4). @freakonometrics freakonometrics freakonometrics.hypotheses.org 33
  • 34. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics k-Nearest Neighbors and Curse of Dimensionality The higher the dimension, the larger the distance to the closest neigbbor min i∈{1,··· ,n} {d(a, xi)}, xi ∈ Rd . q dim1 dim2 dim3 dim4 dim5 0.00.20.40.60.81.0 dim1 dim2 dim3 dim4 dim5 0.00.20.40.60.81.0 n = 10 n = 100 @freakonometrics freakonometrics freakonometrics.hypotheses.org 34
  • 35. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics k-Nearest Neighbors and Imputation “The best thing to do with missing values is not to have any” (Gertrude Mary Cox) Consider some bivariate observations xi, where some x2,i’s can be (randomly) missing. One can consider simple imputation x2,i = x2 (computed on non-missing values). One can consider a regression of x2 on x1 (but that’s aribrary) @freakonometrics freakonometrics freakonometrics.hypotheses.org 35
  • 36. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics One can also draw values randomly x2,i = E[X2|X1 = x1,i] + εi where ε has a Gaussian distribution. One can consider a PCA (on non-missing values) @freakonometrics freakonometrics freakonometrics.hypotheses.org 36
  • 37. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics or one can consider k nearest-neighbors. There are packages dedicated to imputation (see VIM package) 1 > library(VIM) @freakonometrics freakonometrics freakonometrics.hypotheses.org 37
  • 38. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Missing humidity giving the temperature 1 > y= tao[,c("Air.Temp","Humidity")] 2 > histMiss(y) 22 24 26 28 020406080 Air.Temp missing/observedinHumidity missing C 1 > y= tao[,c("Humidity","Air.Temp")] 2 > histMiss(y) 70 75 80 85 90 95 020406080100 Humidity missing/observedinAir.Temp missing @freakonometrics freakonometrics freakonometrics.hypotheses.org 38
  • 39. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics k-Nearest Neighbors and Imputation This package countains a k-Neareast Neighbors algorithm for imputation 1 > tao_kNN <- kNN(tao , k = 5) Imputation can be visualized using 1 vars <- c("Air.Temp","Humidity"," Air.Temp_imp","Humidity_imp") 2 marginplot(tao_kNN[,vars], delimiter="imp", alpha =0.6) q q q 93 813 22 23 24 25 26 27 28 7580859095 Air.Temp Humidity @freakonometrics freakonometrics freakonometrics.hypotheses.org 39
  • 40. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Bandwidth selection : MISE for Density MSE[ ˜f(y)] = bias[ ˜f(y)]2 + Var[ ˜f(y)] MSE[ ˜f(y)] = f(y) 1 nh k(u)2 du + h4 f (y) 2 k(u)u2 du 2 + o h4 + 1 nh Bandwidth choice is based on minimization of the asymptotic integrated MSE (over y) MISE( ˜f) = MSE[ ˜f(y)]dy ∼ 1 nh k(u)2 du + h4 f (y) 2 k(u)u2 du 2 @freakonometrics freakonometrics freakonometrics.hypotheses.org 40
  • 41. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Bandwidth selection : MISE for Density Thus, the first-order condition yields − C1 nh2 + h3 f (y)2 dyC2 = 0 with C1 = k2 (u)du and C2 = k(u)u2 du 2 , and h = n− 1 5 C1 C2 f (y)dy 1 5 h = 1.06n− 1 5 Var[Y ] from Silverman (1986) Density Estimation 1 > bw.nrd0(cars$speed) 2 [1] 2.150016 3 > bw.nrd(cars$speed) 4 [1] 2.532241 with Scott correction, see Scott (1992) Multivariate Density Estimation @freakonometrics freakonometrics freakonometrics.hypotheses.org 41
  • 42. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Bandwidth selection : MISE for Regression Model One can prove that MISE[mh] ∼ bias2 h4 4 x2 k(x)dx 2 m (x) + 2m (x) f (x) f(x) 2 dx + σ2 nh k2 (x)dx · dx f(x) variance as n → ∞ and nh → ∞. The bias is sensitive to the position of the xi’s. h = n− 1 5   C1 dx f(x) C2 m (x) + 2m (x)f (x) f(x) dx   1 5 Problem: depends on unknown f(x) and m(x). @freakonometrics freakonometrics freakonometrics.hypotheses.org 42
  • 43. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Bandwidth Selection : Cross Validation Consider some risk, function of some parameter h. E.g. MISE[mh]. A first idea is to consider a validation set approach, • Split the data in two parts • Train the method in the first part • Compute the error on the second part Problem : every split yields a different estimate of the error @freakonometrics freakonometrics freakonometrics.hypotheses.org 43 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
  • 44. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Bandwidth Selection : Cross Validation Consider leave-one-out cross validation For every i ∈ {1, 2, · · · , n} • Train the model on every point, but i • Compute the test error on the held out point CVLOO = 1 n n i=1 yi − y (−i) i 2 where the prediction is obtained on the model based on data where observation i was removed. It can be computationally expensive CVLOO = 1 n n i=1 yi − yi 1 − hi,i 2 where hi,i is the leverage statistic @freakonometrics freakonometrics freakonometrics.hypotheses.org 44 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
  • 45. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Bandwidth Selection : Cross Validation Consider k-fold cross validation For every j ∈ {1, 2, · · · , n} • Train the model on every fold, but the ith • Compute the test error on the ith fold As we increase k in k-fold cross-validation, we decrease the bias, but increase the variance. One can use bootstrap to estimate measures of uncertainty @freakonometrics freakonometrics freakonometrics.hypotheses.org 45 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
  • 46. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Bandwidth Selection : Cross Validation Let R(h) = E (Y − mh(X))2 . Natural idea R(h) = 1 n n i=1 (yi − mh(xi)) 2 Instead use leave-one-out cross validation, R(h) = 1 n n i=1 yi − m (i) h (xi) 2 where m (i) h is the estimator obtained by omitting the ith pair (yi, xi) or k-fold cross validation, R(h) = 1 n k j=1 i∈Ij yi − m (j) h (xi) 2 where m (j) h is the estimator obtained by omitting pairs (yi, xi) with i ∈ Ij. @freakonometrics freakonometrics freakonometrics.hypotheses.org 46 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q
  • 47. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Bandwidth Selection : Cross Validation Then find (numerically) h = argmin R(h) In the context of density estimation, see Chiu (1991) Bandwidth Selection for Kernel Density Es- timation 2 4 6 8 10 1416182022 bandwidth Usual bias-variance tradeoff, or Goldilock principle: h should be neither too small, nor too large • undersmoothed: bias too large, variance too small • oversmoothed: variance too large, bias too small @freakonometrics freakonometrics freakonometrics.hypotheses.org 47
  • 48. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Local Linear Regression Consider ˆm(x) defined as ˆm(x) = β0 where (β0, β) is the solution of min (β0,β) n i=1 ω (x) i yi − [β0 + (x − xi)T β] 2 where ω (x) i = kh(x − xi), e.g. i.e. we seek the constant term in a weighted least squares regression of yi’s on x − xi’s. If Xx is the matrix [1 (x − X)T ], and if W x is a matrix diag[kh(x − x1), · · · , kh(x − xn)] then ˆm(x) = 1T (XT xW xXx)−1 XT xW xy This estimator is also a linear predictor : ˆm(x) = n i=1 ai(x) ai(x) yi @freakonometrics freakonometrics freakonometrics.hypotheses.org 48
  • 49. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics where ai(x) = 1 n kh(x − xi) 1 − s1(x)T s2(x)−1 x − xi h with s1(x) = 1 n n i=1 kh(x−xi) x − xi h and s2(x) = 1 n n i=1 kh(x−xi) x − xi h x − xi h Note that Nadaraya-Watson estimator was simply the solution of min β0 n i=1 ω (x) i (yi − β0) 2 where ω (x) i = kh(x − xi) E[ ˆm(x)] ∼ m(x) + h2 2 m (x)µ2 where µ2 = k(u)u2 du. Var[ ˆm(x)] ∼ 1 nh νσ2 x f(x) where ν = k(u)2 du @freakonometrics freakonometrics freakonometrics.hypotheses.org 49
  • 50. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Thus, kernel regression MSE is h2 4 g (x) + 2g (x) f (x) f(x) 2 µ2 2 + 1 nh νσ2 x f(x) q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 Vitesse du véhciule Distancedefreinage q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 Vitesse du véhciule Distancedefreinage 1 > loess(dist ˜ speed , cars ,span =0.75 , degree =1) @freakonometrics freakonometrics freakonometrics.hypotheses.org 50
  • 51. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics 2 > predict(REG , data.frame(speed = seq(5, 25, 0.25)), se = TRUE) q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 Vitesse du véhciule Distancedefreinage q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25020406080100120 Vitesse du véhciule Distancedefreinage @freakonometrics freakonometrics freakonometrics.hypotheses.org 51
  • 52. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Local polynomials One might assume that, locally, m(x) ∼ µx(u) as u ∼ 0, with µx(u) = β (x) 0 + β (x) 1 + [u − x] + β (x) 2 + [u − x]2 2 + β (x) 3 + [u − x]3 2 + · · · and we estimate β(x) by minimizing n i=1 ω (x) i yi − µx(xi) 2 . If Xx is the design matrix 1 xi − x [xi − x]2 2 [xi − x]3 3 · · · , then β (x) = XT xW xXx −1 XT xW xy (weighted least squares estimators). 1 > library(locfit) 2 > locfit(dist˜speed ,data=cars) @freakonometrics freakonometrics freakonometrics.hypotheses.org 52
  • 53. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Series Regression Recall that E[Y |X = x] = m(x). Why not approximate m by a linear combination of approx- imating functions h1(x), · · · , hk(x). Set h(x) = (h1(x), · · · , hk(x)), and consider the regression of yi’s on h(xi)’s, yi = h(xi)T β + εi Then β = (HT H)−1 HT y q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 Vitesse du véhciule Distancedefreinage q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 Vitesse du véhciule Distancedefreinage @freakonometrics freakonometrics freakonometrics.hypotheses.org 53
  • 54. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Series Regression : polynomials Even if m(x) = E(Y |X = x) is not a polynomial function, a polynomial can still be a good approximation. From Stone-Weierstrass theorem, if m(·) is continuous on some interval, then there is a uniform approximation of m(·) by polynomial functions. 1 > reg <- lm(y˜x,data=db) q q q q q q q qq q q q q q q q q qq q qq q q q qq q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q qq q q q q q q q q 0 2 4 6 8 10 −1.5−1.0−0.50.00.51.01.5 q q q q q q q qq q q q q q q q q qq q qq q q q qq q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q qq q q q q q q q q 0 2 4 6 8 10 −1.5−1.0−0.50.00.51.01.5 @freakonometrics freakonometrics freakonometrics.hypotheses.org 54
  • 55. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Series Regression : polynomials Assume that m(x) = E(Y |X = x) = k i=0 αixi , where pa- rameters α0, · · · , αk will be estimated (but not k). 1 > reg <- lm(y˜poly(x ,5) ,data=db) 2 > reg <- lm(y˜poly(x ,25) ,data=db) @freakonometrics freakonometrics freakonometrics.hypotheses.org 55 q q q q q q q qq q q q q q q q q qq q qq q q q qq q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q qq q q q q q q q q 0 2 4 6 8 10 −1.5−1.0−0.50.00.51.01.5 q q q q q q q qq q q q q q q q q qq q qq q q q qq q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q qq q q q q q q q q 0 2 4 6 8 10 −1.5−1.0−0.50.00.51.01.5
  • 56. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Series Regression : (Linear) Splines Consider m + 1 knots on X, min{xi} ≤ t0 ≤ t1 ≤ · · · ≤ tm ≤ max{xn}, then define linear (degree = 1) splines positive function, bj,1(x) = (x − tj)+ =    x − tj if x > tj 0 otherwise for linear splines, consider Yi = β0 + β1Xi + β2(Xi − s)+ + εi 1 > positive_part <- function(x) ifelse(x>0,x ,0) 2 > reg <- lm(Y˜X+positive_part(X-s), data=db) @freakonometrics freakonometrics freakonometrics.hypotheses.org 56 q q q q q q q qq q q q q q q q q qq q qq q q q qq q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q qq q q q q q q q q 0 2 4 6 8 10 −1.5−1.0−0.50.00.51.01.5 q q q q q q q qq q q q q q q q q qq q qq q q q qq q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q qq q q q q q q q q
  • 57. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Series Regression : (Linear) Splines for linear splines, consider Yi = β0 + β1Xi + β2(Xi − s1)+ + β3(Xi − s2)+ + εi 1 > reg <- lm(Y˜X+positive_part(X-s1)+ 2 positive_part(X-s2), data=db) 3 > library(bsplines) A spline is a function defined by piecewise polynomials. b-splines are defined recursively @freakonometrics freakonometrics freakonometrics.hypotheses.org 57 q q q q q q q qq q q q q q q q q qq q qq q q q qq q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q qq q q q q q q q q 0 2 4 6 8 10 −1.5−1.0−0.50.00.51.01.5 q q q q q q q qq q q q q q q q q qq q qq q q q qq q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist
  • 58. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics b-Splines (in Practice) 1 > reg1 <- lm(dist˜speed+positive_part(speed -15) , data=cars) 2 > reg2 <- lm(dist˜bs(speed ,df=2, degree =1) , data= cars) Consider m+1 knots on [0, 1], 0 ≤ t0 ≤ t1 ≤ · · · ≤ tm ≤ 1, then define recursively b-splines as bj,0(t) =    1 if tj ≤ t < tj+1 0 otherwise, and bj,n(t) = t − tj tj+n − tj bj,n−1(t) + tj+n+1 − t tj+n+1 − tj+1 bj+1,n−1(t) q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist @freakonometrics freakonometrics freakonometrics.hypotheses.org 58
  • 59. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics b-Splines (in Practice) 1 > summary(reg1) 2 3 Coefficients : 4 Estimate Std Error t value Pr(>|t|) 5 (Intercept) -7.6519 10.6254 -0.720 0.475 6 speed 3.0186 0.8627 3.499 0.001 ** 7 (speed -15) 1.7562 1.4551 1.207 0.233 8 9 > summary(reg2) 10 11 Coefficients : 12 Estimate Std Error t value Pr(>|t|) 13 (Intercept) 4.423 7.343 0.602 0.5493 14 bs(speed)1 33.205 9.489 3.499 0.0012 ** 15 bs(speed)2 80.954 8.788 9.211 4.2e -12 *** @freakonometrics freakonometrics freakonometrics.hypotheses.org 59 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist 0 0.25 0.5 0.75 1 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist 0 0.25 0.5 0.75 1
  • 60. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics b and p-Splines Note that those spline function define an orthonormal ba- sis. O’Sullivan (1986) A statistical perspective on ill-posed in- verse problems suggested a penalty on the second deriva- tive of the fitted curve (see #3). m(x) = argmin n i=1 yi − b(xi)T β 2 + λ R b (xi)T β @freakonometrics freakonometrics freakonometrics.hypotheses.org 60 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist 0 0.25 0.5 0.75 1 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist 0 0.25 0.5 0.75 1
  • 61. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Adding Constraints: Convex Regression Assume that yi = m(xi) + εi where m : Rd → ∞R is some convex function. m is convex if and only if ∀x1, x2 ∈ Rd , ∀t ∈ [0, 1], m(tx1 + [1 − t]x2) ≤ tm(x1) + [1 − t]m(x2) Proposition (Hidreth (1954) Point Estimates of Ordinates of Concave Functions) m = argmin m convex n i=1 yi − m(xi) 2 Then θ = (m (x1), · · · , m (xn)) is unique. Let y = θ + ε, then θ = argmin θ∈K n i=1 yi − θi) 2 where K = {θ ∈ Rn : ∃m convex , m(xi) = θi}. I.e. θ is the projection of y onto the (closed) convex cone K. The projection theorem gives existence and unicity. @freakonometrics freakonometrics freakonometrics.hypotheses.org 61
  • 62. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Adding Constraints: Convex Regression In dimension 1: yi = m(xi) + εi. Assume that observations are ordered x1 < x2 < · · · < xn. Here K = θ ∈ Rn : θ2 − θ1 x2 − x1 ≤ θ3 − θ2 x3 − x2 ≤ · · · ≤ θn − θn−1 xn − xn−1 Hence, quadratic program with n − 2 linear con- straints. m is a piecewise linear function (interpolation of consecutive pairs (xi, θi )). If m is differentiable, m is convex if m(x) + m(x) · [y − x] ≤ m(y) q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist @freakonometrics freakonometrics freakonometrics.hypotheses.org 62
  • 63. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Adding Constraints: Convex Regression More generally: if m is convex, then there exists ξx ∈ Rn such that m(x) + ξx · [y − x] ≤ m(y) ξx is a subgradient of m at x. And then ∂m(x) = m(x) + ξ · [y − x] ≤ m(y), ∀y ∈ Rn Hence, θ is solution of argmin y − θ 2 subject to θi + ξi[xj − xi] ≤ θj, ∀i, j and ξ1, · · · , ξn ∈ Rn . @freakonometrics freakonometrics freakonometrics.hypotheses.org 63 q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 speed dist
  • 64. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Spatial Smoothing One can also consider some spatial smoothing, if we want to predict E[Y |X = x] for some coordinate x. 1 > library(rgeos) 2 > library(rgdal) 3 > library(maptools) 4 > library( cartography ) 5 > download.file("http://bit.ly/2G3KIUG","zonier.RData") 6 > load("zonier.RData") 7 > cols=rev(carto.pal(pal1="red.pal",n1=10, pal2="green.pal",n2 =10)) 8 > download.file("http://bit.ly/2GSvzGW","FRA_adm0.rds") 9 > download.file("http://bit.ly/2FUZ0Lz","FRA_adm2.rds") 10 > FR=readRDS("FRA_adm2.rds") 11 > donnees_carte=data.frame(FRdata) @freakonometrics freakonometrics freakonometrics.hypotheses.org 64
  • 65. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Spatial Smoothing 1 > FR0=readRDS("FRA_adm0.rds") 2 > plot(FR0) 3 > bk = seq (-5,4.5, length =21) 4 > cuty = cut(simbase$Y,breaks=bk ,labels =1:20) 5 > points(simbase$long ,simbase$lat ,col=cols [cuty],pch =19, cex =.5) One can consider a choropleth map (spatial version of the histogram). @freakonometrics freakonometrics freakonometrics.hypotheses.org 65
  • 66. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics 1 > A=aggregate(x = simbase$Y,by=list( simbase$dpt),mean) 2 > names(A)=c("dpt","y") 3 > d=donnees_carte$CCA_2 4 > d[d=="2A"]="201" 5 > d[d=="2B"]="202" 6 > donnees_carte$dpt=as.numeric(as. character(d)) 7 > donnees_carte=merge(donnees_carte ,A,all. x=TRUE) 8 > donnees_carte=donnees_carte[order( donnees_carte$OBJECTID) ,] 9 > bk=seq ( -2.75 ,2.75 , length =21) 10 > donnees_carte$cuty=cut(donnees_carte$y, breaks=bk ,labels =1:20) 11 > plot(FR , col=cols[donnees_carte$cuty], xlim=c( -5.2 ,12)) Spatial Smoothing @freakonometrics freakonometrics freakonometrics.hypotheses.org 66
  • 67. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Spatial Smoothing Instead of a ”continuous” gradient of colors, one can consider only 4 colors (4 levels) for the prediction. 1 > bk=seq ( -2.75 ,2.75 , length =5) 2 > donnees_carte$cuty=cut(donnees_carte$y, breaks=bk ,labels =1:4) 3 > plot(FR , col=cols[c(3 ,8 ,12 ,17)][ donnees_ carte$cuty],xlim=c( -5.2 ,12)) @freakonometrics freakonometrics freakonometrics.hypotheses.org 67
  • 68. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Spatial Smoothing 1 > P1 = FR0@polygons [[1]] @Polygons [[355]] @coords 2 > P2 = FR0@polygons [[1]] @Polygons [[27]] @coords 3 > plot(FR0 ,border=NA) 4 > polygon(P1) 5 > polygon(P2) 6 > grille <-expand.grid(seq(min(simbase$long ),max(simbase$long),length =101) ,seq(min (simbase$lat),max(simbase$lat),length =101)) 7 > paslong =(max(simbase$long)-min(simbase$ long))/100 8 > paslat =(max(simbase$lat)-min(simbase$lat ))/100 @freakonometrics freakonometrics freakonometrics.hypotheses.org 68
  • 69. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Spatial Smoothing We need to create a grid (i.e. X) on which we approximate E[Y |X = x] 1 > f=function(i){ (point.in.polygon (grille [i, 1]+ paslong/2 , grille[i, 2]+ paslat/ 2 , P1[,1],P1[ ,2]) >0)+( point.in.polygon (grille[i, 1]+ paslong/2 , grille[i, 2]+ paslat/2 , P2[,1],P2[ ,2]) >0) } 2 > indic=unlist(lapply (1: nrow(grille),f)) 3 > grille=grille[which(indic ==1) ,] 4 > points(grille [ ,1]+ paslong/2,grille [ ,2]+ paslat/2) @freakonometrics freakonometrics freakonometrics.hypotheses.org 69
  • 70. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Spatial Smoothing Consider here some k-NN, with k = 20 1 > library(geosphere) 2 > knn=function(i,k=20){ 3 + d= distHaversine (grille[i,1:2] , simbase[,c ("long","lat")], r=6378.137) 4 + r=rank(d) 5 + ind=which(r<=k) 6 + mean(simbase[ind ,"Y"]) 7 + } 8 > grille$y=Vectorize(knn)(1: nrow(grille)) 9 > bk=seq ( -2.75 ,2.75 , length =21) 10 > grille$cuty=cut(grille$y,breaks=bk , labels =1:20) 11 > points(grille [ ,1]+ paslong/2,grille [ ,2]+ paslat/2,col=cols[grille$cuty],pch =19) @freakonometrics freakonometrics freakonometrics.hypotheses.org 70
  • 71. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Spatial Smoothing Again, instead of a ”continuous” gradient, we can use 4 levels, 1 > bk=seq ( -2.75 ,2.75 , length =5) 2 > grille$cuty=cut(grille$y,breaks=bk , labels =1:4) 3 > plot(FR0 ,border=NA) 4 > polygon(P1) 5 > polygon(P2) 6 > points(grille [ ,1]+ paslong/2,grille [ ,2]+ paslat/2,col=cols[c(3 ,8 ,12 ,17)][ grille$ cuty],pch =19) @freakonometrics freakonometrics freakonometrics.hypotheses.org 71
  • 72. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Testing (Non-)Linearities In the linear model, y = Xβ = X[XT X]−1 XT H y Hi,i is the leverage of the ith element of this hat matrix. Write yi = n j=1 [XT i [XT X]−1 XT ]jyj = n j=1 [H(Xi)]jyj where H(x) = xT [XT X]−1 XT The prediction is m(x) = E(Y |X = x) = n j=1 [H(x)]jyj @freakonometrics freakonometrics freakonometrics.hypotheses.org 72
  • 73. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Testing (Non-)Linearities More generally, a predictor m is said to be linear if for all x if there is S(·) : Rn → Rn such that m(x) = n j=1 S(x)jyj Conversely, given y1, · · · , yn, there is a matrix S n × n such that y = Sy For the linear model, S = H. trace(H) = dim(β): degrees of freedom Hi,i 1 − Hi,i is related to Cook’s distance, from Cook (1977), Detection of Influential Observations in Linear Regression. @freakonometrics freakonometrics freakonometrics.hypotheses.org 73
  • 74. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Testing (Non-)Linearities For a kernel regression model, with kernel k and bandwidth h S (k,h) i,j = kh(xi − xj) n k=1 kh(xk − xj) where kh(·) = k(·/h), while S(k,h) (x)j = Kh(x − xj) n k=1 kh(x − xk) For a k-nearest neighbor, S (k) i,j = 1 k 1(j ∈ Ixi ) where Ixi are the k nearest observations to xi, while S(k) (x)j = 1 k 1(j ∈ Ix). @freakonometrics freakonometrics freakonometrics.hypotheses.org 74
  • 75. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Testing (Non-)Linearities Observe that trace(S) is usually seen as a degree of smoothness. Do we have to smooth? Isn’t linear model sufficent? Define T = Sy − Hy trace([S − H]T[S − H]) If the model is linear, then T has a Fisher distribution. Remark: In the case of a linear predictor, with smoothing matrix Sh R(h) = 1 n n i=1 (yi − m (−i) h (xi))2 = 1 n n i=1 Yi − mh(xi) 1 − [Sh]i,i 2 We do not need to estimate n models. One can also minimize GCV (h) = n2 n2 − trace(S)2 · 1 n n i=1 (Yi − mh(xi)) 2 ∼ Mallow’s Cp @freakonometrics freakonometrics freakonometrics.hypotheses.org 75
  • 76. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Confidence Intervals If y = mh(x) = Sh(x)y, let σ2 = 1 n n i=1 (yi − mh(xi)) 2 and a confidence interval is, at x mh(y) ± t1−α/2σ Sh(x)Sh(x)T . q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 vitesse du véhicule distancedefreinage q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q @freakonometrics freakonometrics freakonometrics.hypotheses.org 76
  • 77. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Confidence Bands 0 50 100 150 5 10 15 20 25 dist speed 0 50 100 150 5 10 15 20 25 dist speed To go further see functional confidence regions * @freakonometrics freakonometrics freakonometrics.hypotheses.org 77
  • 78. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Boosting to Capture NonLinear Effects We want to solve m = argmin E (Y − m(X))2 The heuristics is simple: we consider an iterative process where we keep modeling the errors. Fit model for y, h1(·) from y and X, and compute the error, ε1 = y − h1(X). Fit model for ε1, h2(·) from ε1 and X, and compute the error, ε2 = ε1 − h2(X), etc. Then set mk(·) = h1(·) ∼y + h2(·) ∼ε1 + h3(·) ∼ε2 + · · · + hk(·) ∼εk−1 Hence, we consider an iterative procedure, mk(·) = mk−1(·) + hk(·). @freakonometrics freakonometrics freakonometrics.hypotheses.org 78
  • 79. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Boosting h(x) = y − mk(x), which can be interpreted as a residual. Note that this residual is the gradient of 1 2 [y − mk(x)]2 A gradient descent is based on Taylor expansion f(xk) f,xk ∼ f(xk−1) f,xk−1 + (xk − xk−1) α f(xk−1) f,xk−1 But here, it is different. We claim we can write fk(x) fk,x ∼ fk−1(x) fk−1,x + (fk − fk−1) β ? fk−1, x where ? is interpreted as a ‘gradient’. @freakonometrics freakonometrics freakonometrics.hypotheses.org 79
  • 80. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Boosting Construct iteratively mk(·) = mk−1(·) + argmin h∈H n i=1 (yi − [mk−1(xi) + h(xi)])2 mk(·) = mk−1(·) + argmin h∈H n i=1 ([yi − mk−1(xi)] − h(xi)])2 where h ∈ H means that we seek in a class of weak learner functions. If learner are two strong, the first loop leads to some fixed point, and there is no learning procedure, see linear regression y = xT β + ε. Since ε ⊥ x we cannot learn from the residuals. In order to make sure that we learn weakly, we can use some shrinkage parameter ν (or collection of parameters νj). @freakonometrics freakonometrics freakonometrics.hypotheses.org 80
  • 81. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Boosting with Piecewise Linear Spline & Stump Functions Instead of εk = εk−1 − hk(x), set εk = εk−1 − ν·hk(x) 1 for(t in 1:100){fit = rpart(yr˜x,data=df) 2 yp = predict(fit ,newdata=df) 3 df$yr = df$yr - v*yp)} Remark : bumps are related to regression trees. @freakonometrics freakonometrics freakonometrics.hypotheses.org 81 q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q q q qq q q q qq q q q qq q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q qq q q q q q q qq q q q q q q qq q q q q q q qq q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qqq q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q qqqqq q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q 0 1 2 3 4 5 6 −1.5−1.0−0.50.00.51.01.5 q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q q q qq q q q qq q q q qq q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q qq q q q q q q qq q q q q q q qq q q q q q q qq q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qqq q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q qqqqq q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q 0 1 2 3 4 5 6 −1.5−1.0−0.50.00.51.01.5
  • 82. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Ruptures One can use Chow test to test for a rupture. Note that it is simply Fisher test, with two parts, β =    β1 for i = 1, · · · , i0 β2 for i = i0 + 1, · · · , n and test    H0 : β1 = β2 H1 : β1 = β2 i0 is a point between k and n − k (we need enough observations). Chow (1960) Tests of Equality Between Sets of Coefficients in Two Linear Regressions suggested Fi0 = ηT η − εT ε εT ε/(n − 2k) where εi = yi − xT i β, and ηi =    Yi − xT i β1 for i = k, · · · , i0 Yi − xT i β2 for i = i0 + 1, · · · , n − k @freakonometrics freakonometrics freakonometrics.hypotheses.org 82
  • 83. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Ruptures 1 > Fstats(dist ˜ speed ,data=cars ,from =7/50) q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 Vitesse du véhicule Distancedefeinage q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q Indice Fstatistics 0 10 20 30 40 50 024681012 q @freakonometrics freakonometrics freakonometrics.hypotheses.org 83
  • 84. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Ruptures 1 > Fstats(dist ˜ speed ,data=cars ,from =2/50) q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 5 10 15 20 25 020406080100120 Vitesse du véhicule Distancedefeinage q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q Indice Fstatistics 0 10 20 30 40 50 024681012 q @freakonometrics freakonometrics freakonometrics.hypotheses.org 84
  • 85. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Ruptures If i0 is unknown, use CUSUM types of tests, see Ploberger & Kr¨amer (1992) The Cusum Test with OLS Residuals. For all t ∈ [0, 1], set Wt = 1 σ √ n nt i=1 εi. If α is the confidence level, bounds are generally ±α, even if theoretical bounds should be ±α t(1 − t). 1 > cusum <- efp(dist ˜ speed , type = "OLS -CUSUM",data=cars) 2 > plot(cusum ,ylim=c(-2,2)) 3 > plot(cusum , alpha = 0.05 , alt.boundary = TRUE ,ylim=c(-2,2)) @freakonometrics freakonometrics freakonometrics.hypotheses.org 85
  • 86. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Ruptures OLS−based CUSUM test Time Empiricalfluctuationprocess 0.0 0.2 0.4 0.6 0.8 1.0 −2−1012 OLS−based CUSUM test with alternative boundaries Time Empiricalfluctuationprocess 0.0 0.2 0.4 0.6 0.8 1.0 −2−1012 @freakonometrics freakonometrics freakonometrics.hypotheses.org 86
  • 87. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics From a Rupture to a Discontinuity See Imbens & Lemieux (2008) Regression Discontinuity Designs. @freakonometrics freakonometrics freakonometrics.hypotheses.org 87
  • 88. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics From a Rupture to a Discontinuity Consider the dataset from Lee (2008) Randomized experiments from non-random selection in U.S. House elections. 1 > library(RDDtools) 2 > data(Lee2008) We want to test if there is a discontinuity in 0. • with parametric tools • with nonparametric tools @freakonometrics freakonometrics freakonometrics.hypotheses.org 88 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q qq qq q q q q q q qqq qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qqqqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q qq qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qqq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q qqq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qqq q q q q q q qqq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q qqq q q q q qq q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qqq q q q q q q q q q q q q q q q q q qq q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qqq q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q qqq qqqq qq q qqq q q q qq qqq q qqq qqq q q qqqqqqq qqq qqq q q qqq qq qqq q q qqq q q qqqqqqqqq q qqq q q qq qq qqqqq q qqq q q qqqqqq qqq q qqq q q qqqq qq q qq q qqq q q q qqq q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq qq q q q q qq q qqq q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q qq qq q q q q q q q q qqq q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q qqqq q q q q q q q q q q q qqqqq q q q qq q q q q q q q q q q q q q q q q q q q q qqqqq q q q q q q q q q q q qqq q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q qq q q qq q qq q q q q q q q q q q q q qq q q q qq q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q qqqqqq q qq q q q q q q qqqq q q q q q qq q q q q qqqq q q q q q q q q q q q q q q q q q q q q q q q q q qqqqq q q q q q qq q q q q qqq q q q q q q q q q q q q qqqq q q q q q q q q q q q qqqqq q q q q q q q q q q qqqq q q qq q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q qqqq qq q q q q q qqqqqqq q q q q q q q q qqq q q q q q q q q q q q q qqqqq q q q qq q q q q qqqqq qq q qq q q q q qq q q q qq q q q q q qq qqqq q q q q q q q q q q q q qq q q q q q q q q q q q q q qqq q q q q q q q q q q q q qqq q q q q q q q q q q q q qqq q q q q q q q q q q q q qq q q q qq q q q qq q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q qq q q q q q q qqq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qqqqq q q qq q q qq qq qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqqq q q q q q q q q q q q qqq q q q q q q q q q q q qqqqq q q q q q q q q q q q qq q q qq q qq q q q q q q q q q q q q q q q q q q q q qqqqq qq q q q q q q q q q q q q q q q q qq q q q q q q qqqq qq qq q q qq q q qq q q q q q q q q q q q q q q q q q q q q qq q q qq q q qqq q q q q q q q q q q q q qqq q q q q qq q q q q q q q qqq q q q q q q q q q q q q qqqq qq q q q q q q q q qqqq q q q q q q q q q q q q q q qq qq q q q q q q q q qqqqq q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q qqq q q qqq q q q q qq q q q q q q q q q q q q qq q q qq q q q q q q q q q qqq q q q qq q q q q q q qqqqq qq q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q qq q q q q q q qqq qqq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q qq q q q q q qq q q qq q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qqq q q q qqq q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q qq q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q qq q q q q q q q q q qq q q q q q q q q q q qq q qq q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q qqq q q q q q q −1.0 −0.5 0.0 0.5 1.0 0.00.20.40.60.81.0 x y
  • 89. Arthur Charpentier, UB School of Economics Summer School 2018 - Big Data for Economics Testing for a rupture Use some 4th order polynomial, on each part 1 > idx1 = (Lee2008$x >0) 2 > reg1 = lm(y˜poly(x ,4) ,data=Lee2008[ idx1 ,]) 3 > idx2 = (Lee2008$x <0) 4 > reg2 = lm(y˜poly(x ,4) ,data=Lee2008[ idx2 ,]) 5 > s1=predict(reg1 ,newdata=data.frame(x =0)) 6 > s2=predict(reg2 ,newdata=data.frame(x =0)) 7 > abs(s1 -s2) 8 1 9 0.07659014 @freakonometrics freakonometrics freakonometrics.hypotheses.org 89 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q qq qq q q q q q q qqq qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qqqqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q qq qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qqq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q qqq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qqq q q q q q q qqq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q qqq q q q q qq q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qqq q q q q q q q q q q q q q q q q q qq q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qqq q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q qqq qqqq qq q qqq q q q qq qqq q qqq qqq q q qqqqqqq qqq qqq q q qqq qq qqq q q qqq q q qqqqqqqqq q qqq q q qq qq qqqqq q qqq q q qqqqqq qqq q qqq q q qqqq qq q qq q qqq q q q qqq q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq qq q q q q qq q qqq q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q qq qq q q q q q q q q qqq q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q qqqq q q q q q q q q q q q qqqqq q q q qq q q q q q q q q q q q q q q q q q q q q qqqqq q q q q q q q q q q q qqq q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q qq q q qq q qq q q q q q q q q q q q q qq q q q qq q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q qqqqqq q qq q q q q q q qqqq q q q q q qq q q q q qqqq q q q q q q q q q q q q q q q q q q q q q q q q q qqqqq q q q q q qq q q q q qqq q q q q q q q q q q q q qqqq q q q q q q q q q q q qqqqq q q q q q q q q q q qqqq q q qq q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q qqqq qq q q q q q qqqqqqq q q q q q q q q qqq q q q q q q q q q q q q qqqqq q q q qq q q q q qqqqq qq q qq q q q q qq q q q qq q q q q q qq qqqq q q q q q q q q q q q q qq q q q q q q q q q q q q q qqq q q q q q q q q q q q q qqq q q q q q q q q q q q q qqq q q q q q q q q q q q q qq q q q qq q q q qq q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q qq q q q q q q qqq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qqqqq q q qq q q qq qq qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqqq q q q q q q q q q q q qqq q q q q q q q q q q q qqqqq q q q q q q q q q q q qq q q qq q qq q q q q q q q q q q q q q q q q q q q q qqqqq qq q q q q q q q q q q q q q q q q qq q q q q q q qqqq qq qq q q qq q q qq q q q q q q q q q q q q q q q q q q q q qq q q qq q q qqq q q q q q q q q q q q q qqq q q q q qq q q q q q q q qqq q q q q q q q q q q q q qqqq qq q q q q q q q q qqqq q q q q q q q q q q q q q q qq qq q q q q q q q q qqqqq q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q qqq q q qqq q q q q qq q q q q q q q q q q q q qq q q qq q q q q q q q q q qqq q q q qq q q q q q q qqqqq qq q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q qq q q q q q q qqq qqq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q qq q q q q q qq q q qq q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qqq q q q qqq q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q qq q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q qq q q q q q q q q q qq q q q q q q q q q q qq q qq q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q qqq q q q q q q −1.0 −0.5 0.0 0.5 1.0 0.00.20.40.60.81.0 x y