SlideShare une entreprise Scribd logo
1  sur  57
Télécharger pour lire hors ligne
Use & Value of Unusual Data
Arthur Charpentier
UQAM
Actuarial Summer School 2019
@freakonometrics freakonometrics freakonometrics.hypotheses.org 1 / 57
The Team for the Week
Jean-Philippe Boucher UQAM (Montr´eal, Canada)
Professor, holder of The Co-operators Chair in Actuar-
ial Risk Analysis, author of several research articles in
actuarial science.
@J P Boucher jpboucher.uqam.ca
Arthur Charpentier UQAM (Montr´eal, Canada)
Professor, editor of Computational Actuarial Science with
R. Former director of the Data Science for Actuaries pro-
gram of the French Institute of Actuaries
@freakonometrics freakonometrics freakonometrics.hypotheses.org
Ewen Gallic AMSE (Aix-Marseille, France)
Assistant professor, teaching computational economet-
rics, data science and machine learning.
@3wen 3wen egallic.fr
@freakonometrics freakonometrics freakonometrics.hypotheses.org 2 / 57
Practical Issues
@freakonometrics freakonometrics freakonometrics.hypotheses.org 3 / 57
General Context (in Insurance)
Networks and P2P Insurance (via https://sharable.life and source)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 4 / 57
General Context (in Insurance)
Networks and P2P Insurance, via Insurance Business (2016)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 5 / 57
General Context (in Insurance)
Telematics and Usage-Based-Insurance (source, source, The
Actuary)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 6 / 57
General Context (in Insurance)
Satellite pictures (source and Index-based Flood Insurance (IBFI),
http://ibfi.iwmi.org/)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 7 / 57
General Context (in Insurance)
Very small granularity (1 pixel is 15 × 15cm2, see detail)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 8 / 57
More and More (exponentially) Data
1969, (legend says) 5 photographs were taken by Neil Armstrong
More than 800,000,000 pictures are uploaded on every day, on
Facebook (2 billion photos per day with instagram, Messenger and
Whatsapp)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 9 / 57
Data Complexity
Pictures, drawings, videos, handwrit-
ing text, ECG, medical imagery, satel-
lite pictures, tweets, phone calls, etc.
Hard to distinguish structured and un-structured data...
@freakonometrics freakonometrics freakonometrics.hypotheses.org 10 / 57
Data Complexity
@freakonometrics freakonometrics freakonometrics.hypotheses.org 11 / 57
Insurance Data
From questionnaires
to data frames xi,j
row i : insured, policy, claim
column j : variables (features)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 12 / 57
From Pictures to Data
Consider pictures saa-iss.ch/testalbum/nggallery/all-old-galleries/
@freakonometrics freakonometrics freakonometrics.hypotheses.org 13 / 57
From Pictures to Data
Face detection face-api.js (justadudewhohacks.github.io)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 14 / 57
From Pictures to Data
Expression recognition face-api.js (justadudewhohacks.github.io)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 15 / 57
From Pictures to Data
Face recognition face-api.js (justadudewhohacks.github.io)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 16 / 57
Pictures
Extracting information from a picture
What is on the picture ?
What is the cost of such a claim ?
High dimensional data :
h × d color picture, e.g. 850 × 1280
xi ∈ Rd with d > 3 nillion
Need new statistical tools
(deep) neural networks perform well
or reduce dimension
Pictures Odermatts (2003, Karambolage)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 17 / 57
Text
Kaufman et al. (2016, Natural Language Processing-Enabled and
Conventional Data Capture Methods for Input to Electronic Health
Records), Chen et al. (2018, A Natural Language Processing
System That Links Medical Terms in Electronic Health Record
Notes to Lay Definitions) or Simon (2018, Natural Language
Processing for Healthcare Customers)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 18 / 57
Statistics & ML Toolbox : PCA
Projection method, used to reduce dimensionality
Principcal Component Analysis (PCA)
find a small number of directions in input
space that explain variation in input data
represent data by projecting along those
directions
n observations in dimension d, stored in matrix X ∈ Mn×d
Natural ideal: linearly project (multiply by a projection matrix) to
much lower dimensional space k < d
Search for orthogonal directions in space with highest variance
Information is stored in the covariance matrix Σ = X X when
variables are centered
@freakonometrics freakonometrics freakonometrics.hypotheses.org 19 / 57
Statistics & ML Toolbox : PCA
Find the k largest eigenvectors of Σ : principal components
Assemble these eigenvectors into a d × k matrix ˜Σ
Express d-dimensional vectors x by projecting them to
m-dimensional ˜x, ˜x = ˜Σ x
We have data X ∈ Mn×d , i.e. n vectors x ∈ Rd ,
Let ω1 ∈ Rd denote weights.
We want to maximize the variance of z = ω1 x
max
1
n
n
i=1
(ω1 xi − ω1 x)2
= max ω1 Σω1 s.t. ω1 = 1
@freakonometrics freakonometrics freakonometrics.hypotheses.org 20 / 57
Statistics & ML Toolbox : PCA
To solve
max
1
n
n
i=1
(ω1 xi − ω1 x)2
= max ω1 Σω1 s.t. ω1 = 1
we can use Lagrange multiplier to solve this constraint
optimization problem
max
ω1,λ1
ω1 Σω1 + λ(1 − ω1 ω1)
If we differentiate Σω1 − λ1ω1 = 0, i.e. Σω1 = λ1ω1 i.e. ω1 is
an eigenvector of Σ, with eigenvalue the Lagrange multiplier. It
should be the one with the largest eigenvalue.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 21 / 57
Statistics & ML Toolbox : PCA
max ω2 Σω2
subject to ω2 = 1
ω2 ω1 = 0
The Lagragian is ω2 Σω2 + λ2(1 − ω2 ω2) − λ1ω2 ω1
when differentiating, we get Σω2 = λ2ω2
λ2 is the second largest eigenvalue
Set z = W x and ˜x = Mz. We want to solve (where is the
standard 2 norm)
min
W ,M
n
i=1
(xi , ˜xi )
i.e.
min
W ,M
n
i=1
(xi , MWxi )
@freakonometrics freakonometrics freakonometrics.hypotheses.org 22 / 57
Statistics & ML Toolbox : GLM
Generalized Linear Model
Flexible generalization of linear regression that allows for y
to have distribution other than a normal distribution
The classical regression is (matrix form)
y
n×1
= X
n×k
β
k×1
+ ε
n×1
or yi = xT
i β
xi ,β
+ εi
Yi ∼ N(µi , σ2
) , with µi = E[Yi |xi ] = xT
i β
It is possible to consider a more general regression.
For a classification y ∈ {0, 1}n,
Yi ∼ B(pi ) , with pi = E[Yi |xi ] =
exT
i β
1 + exT
i β
Recall that we assumed that (Yi , Xi ) were i.i.d. with unknown
distribution P.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 23 / 57
Statistics & ML Toolbox : GLM
In the (standard) linear model, we assume that
(Y |X = x) ∼ N(x β, σ2).
The maximum-likelihood estimator of β is then
β = argmax
β∈Rp+1
−
n
i=1
(yi − xi β)2
For the logistic regression, (Y |X = x) ∼ B(logit−1
(x β)).
The maximum-likelihood estimator of β is then
argmin
β∈Rp+1
n
i=1
yi log(logit−1
(xi β)) + (1 − yi ) log(1 − logit−1
(xi β))
@freakonometrics freakonometrics freakonometrics.hypotheses.org 24 / 57
Statistics & ML Toolbox : loss and risk
In a general setting, consider some loss function : R × R → R+,
e.g. (y, m) = (y − m)2 (quadratic 2 norm)
Average Risk
The average risk associated with mn is
RP(mn) = EP (Y , mn(X))
@freakonometrics freakonometrics freakonometrics.hypotheses.org 25 / 57
Statistics & ML Toolbox : loss and risk
In a general setting, consider some loss function : R × R → R+,
e.g. (y, m) = (y − m)2 (quadratic 2 norm)
Empirical Risk
The empirical risk associated with mn is
Rn(mn) =
1
n
n
i=1
yi , mn(xi ) .
@freakonometrics freakonometrics freakonometrics.hypotheses.org 26 / 57
Statistics & ML Toolbox : loss and risk
If we minimize the average risk, we overfit...
Hold-Out Cross Validation
1. Split {1, 2, · · · , n} in T (training) and V (validation)
2 . Estimate m on sample (yi , xi ), i ∈ T :
mT = argmin
m∈M
1
|T| i∈V
yi , m(x1,i , · · · , xp,i )
3. Compute
1
|V | i∈V
yi , mT (x1,i , · · · , xp,i )
@freakonometrics freakonometrics freakonometrics.hypotheses.org 27 / 57
Statistics & ML Toolbox : loss and risk
source: Hastie et al. (2009, The Elements of Statistical Learning)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 28 / 57
Statistics & ML Toolbox : loss and risk
Leave-one-Out Cross Validation
1. Estimate n models : estimate m−j on sample (yi , xi ),
i ∈ {1, · · · , n}{j}
m−j = argmin
m∈M
1
n − 1 i=j
yi , m(x1,i , · · · , xp,i )
2. Compute
1
n
n
i=1
yi , m−i (xi )
@freakonometrics freakonometrics freakonometrics.hypotheses.org 29 / 57
Statistics & ML Toolbox : loss and risk
K-Fold Cross Validation
1. Split {1, 2, · · · , n} in K groups V1, · · · , VK
2. Estimate K models : estinate mk on sample (yi , xi ), i ∈
{1, · · · , n}Vk
3. Compute
1
K
K
k=1
1
|Vk| i∈Vk
yi , mk(x1,i , · · · , Xp,i )
@freakonometrics freakonometrics freakonometrics.hypotheses.org 30 / 57
Statistics & ML Toolbox : loss and risk
Bootstrap Cross Validation
1. Generate B boostrap samples from {1, 2, · · · , n},
I1, · · · , IB
2. Estimate B models : mb on sample (yi , xi ), i ∈ Ib
3. Compute
1
B
B
b=1
1
n − |Ib| i∈Ib
yi , mb(x1,i , · · · , Xp,i )
The probability that i /∈ Ib is 1 −
1
n
n
∼ e−1
(= 36.78%)
At stage b, we validate on ∼ 36.78% of the dataset.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 31 / 57
Statistics & ML Toolbox : loss and risk
y ∈ Rd , y = argmin
m∈R



n
i=1
1
n
yi − m
εi
2



is the empirical version of
E[Y ] = argmin
m∈R



y − m
ε
2
dF(y)



= argmin
m∈R



E Y − m
ε
2



where Y is a random variable.
Thus, argmin
m:Rk →R



n
i=1
1
n
yi − m(xi )
εi
2



is the empirical version of
E[Y |X = x].
See Legendre (1805, Nouvelles m´ethodes pour la d´etermination des
orbites des com`etes) and Gauβ (1809, Theoria motus corporum
coelestium in sectionibus conicis solem ambientium).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 32 / 57
Statistics & ML Toolbox : loss and risk
med[y] ∈ argmin
m∈R



n
i=1
1
n
yi − m
εi



is the empirical version of
med[Y ] ∈ argmin
m∈R



y − m
ε
dF(y)



= argmin
m∈R



E Y − m
ε
1



where P[Y ≤ med[Y ]] ≥
1
2
and P[Y ≥ med[Y ]] ≥
1
2
.
argmin
m:Rk →R



n
i=1
1
n
yi − m(xi )
εi



is the empirical version of
med[Y |X = x].
See Boscovich (1757, De Litteraria expeditione per pontificiam
ditionem ad dimetiendos duos meridiani ) and Laplace (1793, Sur
quelques points du syst`eme du monde).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 33 / 57
Statistics & ML Toolbox : loss and risk
For least-squares ( 2 norm), we have a strictly convex problem...
classical optimization routines can be used (e.g. gradient descient)
More complicated with the 1 norm
Consider a sample {y1, · · · , yn}.
To compute the median, solve min
µ
n
i=1
|yi − µ| which can be
solved using linear programming techniques., see Dantzig (1963,
Linear programming).
More precisely, this problem is equivalent to min
µ,a,b
n
i=1
ai + bi
with ai , bi ≥ 0 and yi − µ = ai − bi , ∀i = 1, · · · , n.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 34 / 57
Statistics & ML Toolbox : loss and risk
For a quantile, the linear program is min
µ,a,b
n
i=1
τai + (1 − τ)bi
with ai , bi ≥ 0 and yi − µ = ai − bi , ∀i = 1, · · · , n.
This is extended to the quantile regression,
min
βτ
,a,b
n
i=1
τai + (1 − τ)bi
with ai , bi ≥ 0 and yi − xi βτ
] = ai − bi , ∀i = 1, · · · , n.
But is the following problem the one we should really care about ?
m = argmin
m∈M
n
i=1
yi , m(xi )
@freakonometrics freakonometrics freakonometrics.hypotheses.org 35 / 57
Statistics & ML Toolbox : Penalization
Usually, there are tuning parameters p. To avoid overfit,
1. use cross-validation
mp = argmin
p



i∈validation
yi , mp(xi )



where mp = argmin
m∈Mp



i∈training
yi , m(xi )



2. use penalization
m = argmin
m∈M
n
i=1
yi , m(xi ) + penalization(m)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 36 / 57
Statistics & ML Toolbox : Penalization
Penalization often yield bias... but it might interesting if the risk is
the mean squared error...
Consider a parametric model, with true (unknown) parameter θ,
then
mse(ˆθ) = E (ˆθ − θ)2
= E (ˆθ − E ˆθ )2
variance
+ E (E ˆθ − θ)2
bias2
One can think of a shrinkage of an unbiased estimator,
Let θ denote an unbiased estimator of θ,
ˆθ =
θ2
θ2 + mse(θ)
· θ ≤ θ
satisfies mse(ˆθ) ≤ mse(θ).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 37 / 57
−2 −1 0 1 2 3 4
0.00.20.40.60.8
variance
Statistics & ML Toolbox : Ridge
as in Wieringen (2018 Lecture notes on ridge regression)
Ridge Estimator (OLS)
β
ridge
λ = argmin



n
i=1
(yi − xi β)2
+ λ
p
j=1
β2
j



β
ridge
λ = (XT
X + λI)−1
XT
y
Ridge Estimator (GLM)
β
ridge
λ = argmin



−
n
i=1
log f (yi |µi = g−1
(xi β)) +
λ
2
p
j=1
β2
j



@freakonometrics freakonometrics freakonometrics.hypotheses.org 38 / 57
Statistics & ML Toolbox : Ridge
Observe that if xj1 ⊥ xj2 , then
β
ridge
λ = [1 + λ]−1
β
ols
which explain relationship with shrinkage.
But not in the general case...
q
q
Smaller mse
There exists λ such that mse[β
ridge
λ ] ≤ mse[β
ols
]
@freakonometrics freakonometrics freakonometrics.hypotheses.org 39 / 57
Statistics & ML Toolbox : Ridge
From a Bayesian perspective,
P[θ|y]
posterior
∝ P[y|θ]
likelihood
· P[θ]
prior
i.e. log P[θ|y] = log P[y|θ]
log likelihood
+ log P[θ]
penalty
If β has a prior N(0, τ2I) distribution, then its posterior
distribution has mean
E[β|y, X] = XT
X +
σ2
τ2
I
−1
XT
y.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 40 / 57
Statistics & ML Toolbox : lasso
lasso Estimator (OLS)
β
lasso
λ = argmin



n
i=1
(yi − xi β)2
+ λ
p
j=1
|βj|



lasso Estimator (GLM)
β
lasso
λ = argmin



−
n
i=1
log f (yi |µi = g−1
(xi β)) +
λ
2
p
j=1
|βj|



@freakonometrics freakonometrics freakonometrics.hypotheses.org 41 / 57
Statistics & ML Toolbox : sparcity
In several applications, k can be (very) large, but a lot of features
are just noise: βj = 0 for many j’s. Let s denote the number of
relevant features, with s << k, cf Hastie, Tibshirani & Wainwright
(2015, Statistical Learning with Sparsity),
s = card{S} where S = {j; βj = 0}
The model is now y = XT
SβS + ε, where XT
SXS is a full rank
matrix.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 42 / 57
q
q
= . +
Statistics & ML Toolbox : sparcity
0-regularization Estimator (OLS)
β
lasso
λ = argmin
n
i=1
(yi − xi β)2
+ λ β 0
where a 0 = 1(|aj| > 0) = dim(β).
More generally
p-regularization Estimator (OLS)
β
lasso
λ = argmin
n
i=1
(yi − xi β)2
+ λ β p
• sparsity is obtained when p ≤ 1
• convexity is obtained when p ≥ 1
@freakonometrics freakonometrics freakonometrics.hypotheses.org 43 / 57
Statistics & ML Toolbox : sparcity
source: Hastie et al. (2009, The Elements of Statistical Learning)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 44 / 57
Statistics & ML Toolbox : lasso
In the orthogonal case, XT
X = I,
β
lasso
k,λ = sign(β
ols
k ) |β
ols
k | −
λ
2
i.e. the lasso estimate is related to the soft
threshold function...
q
q
@freakonometrics freakonometrics freakonometrics.hypotheses.org 45 / 57
Numerical Toolbox
LASSO Coordinate Descent Algorithm
1. Set β0 = β
2 . For k = 1, · · ·
for j = 1, · · · , p
(i) compute Rj = xj y−X−jβk−1(−j)
(ii) set βk,j = Rj · 1 −
λ
2|Rj| +
3. The final estimate βκ is βλ
The softmax function log(1+ex−s) can be used
as a smooth version of (x − s)+
@freakonometrics freakonometrics freakonometrics.hypotheses.org 46 / 57
Statistics & ML Toolbox : trees
Use recursive binary splitting to grow a tree
Trees (CART)
Each interior node corresponds to one of the input
variables; there are edges to children for each of
the possible values of that input variable. Each
leaf represents a value of the target variable given
the values of the input variables represented by
the path from the root to the leaf.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 47 / 57
Statistics & ML Toolbox : forests
Bagging
(Bootstrap + Aggregation)
1. For k = 1, · · ·
(i) draw a bootstrap sample
from (yi , xi )’s
(ii) estimate a model mk on
that sample
2. The final model is
m (·) =
1
κ
κ
k=1
mk(·)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 48 / 57
Statistics & ML Toolbox : boosting
Bosting & Sequential Learning
mk(·) = mk−1(·) + argmin
h∈H



n
i=1
yi − mk−1(xi )
εi
, h(xi )



@freakonometrics freakonometrics freakonometrics.hypotheses.org 49 / 57
Statistics & ML Toolbox : Neural Nets
Very popular for pictures...
Picture xi is
• a n×n matrix in {0, 1}n2
for black & white
• a n × n matrix in [0, 1]n2
for grey-scale
• a 3 × n × n array in ([0, 1]3)n2
for color
• a T ×3×n ×n tensor in (([0, 1]3)T )n2
for
video
y here is the label (“8”, “9”, “6”, etc)
Suppose we want to recognize a “6” on a
picture
m(x) =
+1 if x is a “6”
−1 otherwise
@freakonometrics freakonometrics freakonometrics.hypotheses.org 50 / 57
Statistics & ML Toolbox : Neural Nets
Consider some specifics pixels, and associate
weights ω such that
m(x) = sign


i,j
ωi,jxi,j


where
xi,j =
+1 if pixel xi,j is black
−1 if pixel xi,j is white
for some weights ωi,j (that can be nega-
tive...)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 51 / 57
Statistics & ML Toolbox : Neural Nets
A deep network is a network with a lot of
layers
m(x) = sign
i
ωi mi (x)
where mi ’s are outputs of previous neural
nets
Those layers can capture shapes in some ar-
eas
nonlinearities, cross-dependence, etc
@freakonometrics freakonometrics freakonometrics.hypotheses.org 52 / 57
Binary Threshold Neuron & Perceptron
If x ∈ {0, 1}p, McCulloch & Pitts (1943, A logical calculus of the
ideas immanent in nervous activity) suggested a simple model,
with threshold b
yi = f


p
j=1
xj,i

 where f (x) = 1(x ≥ b)
where 1(x ≥ b) = +1 if x ≥ b and 0 otherwise, or (equivalently)
yi = f

ω +
p
j=1
xj,i

 where f (x) = 1(x ≥ 0)
with weight ω = −b. The trick of adding 1 as an input was very
important !
ω = −1 is the or logical operator : yi = 1 if ∃j such that xj,i = 1
ω = −p, is the and logical operator : yi = 1 if ∀j, xj,i = 1
@freakonometrics freakonometrics freakonometrics.hypotheses.org 53 / 57
Binary Threshold Neuron & Perceptron
but not possible for the xor logical operator : yi = 1 if x1,i = x2,i
Rosenblatt (1961, Principles of Neurodynamics: Perceptron
& Theory of Brain Mechanism) considered the extension
where x’s are real-valued, with weight ω ∈ Rp
yi = f


p
j=1
ωjxj,i

 where f (x) = 1(x ≥ b)
where 1(x ≥ b) = +1 if x ≥ b and 0 otherwise, or (equiv-
alently)
yi = f

ω0 +
p
j=1
ωjxj,i

 where f (x) = 1(x ≥ 0)
with weights ω ∈ Rp+1
@freakonometrics freakonometrics freakonometrics.hypotheses.org 54 / 57
Binary Threshold Neuron & Perceptron
Minsky & Papert (1969, Perceptrons: an Introduction to
Computational Geometry) proved that perceptron were a linear
separator, not very powerful
Define the sigmoid function f (x) =
1
1 + e−x
=
ex
ex + 1
(= logistic
function).
This function f is called the activation function.
If y ∈ {−1, +1}, one can consider the hyperbolic tangent
f (x)) =
(ex − e−x )
(ex + e−x )
or the inverse tangent function (see
wikipedia).
@freakonometrics freakonometrics freakonometrics.hypotheses.org 55 / 57
Statistics & ML Toolbox : Neural Nets
So here, for a classication problem,
yi = f

ω0 +
p
j=1
ωjxj,i

 = p(xi )
The error of that model can be com-
puted used the quadratic loss function,
n
i=1
(yi − p(xi ))2
or cross-entropy
n
i=1
(yi log p(xi ) + [1 − yi ] log[1 − p(xi )])
@freakonometrics freakonometrics freakonometrics.hypotheses.org 56 / 57
Statistics & ML Toolbox : Neural Nets
Consider a single hidden layer, with 3 different
neurons, so that
p(x) = f

ω0 +
3
h=1
ωhfh

ωh,0 +
p
j=1
ωh,jxj




or
p(x) = f ω0 +
3
h=1
ωhfh ωh,0 + x ωh
@freakonometrics freakonometrics freakonometrics.hypotheses.org 57 / 57

Contenu connexe

Tendances (20)

Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
 
Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
 
Side 2019, part 2
Side 2019, part 2Side 2019, part 2
Side 2019, part 2
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Machine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & InsuranceMachine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & Insurance
 
Slides ub-7
Slides ub-7Slides ub-7
Slides ub-7
 
Side 2019, part 1
Side 2019, part 1Side 2019, part 1
Side 2019, part 1
 
Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2
 
Mutualisation et Segmentation
Mutualisation et SegmentationMutualisation et Segmentation
Mutualisation et Segmentation
 
Slides econometrics-2018-graduate-3
Slides econometrics-2018-graduate-3Slides econometrics-2018-graduate-3
Slides econometrics-2018-graduate-3
 
Side 2019 #3
Side 2019 #3Side 2019 #3
Side 2019 #3
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
 
Econometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 NonlinearitiesEconometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 Nonlinearities
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
Actuarial Pricing Game
Actuarial Pricing GameActuarial Pricing Game
Actuarial Pricing Game
 
Side 2019 #5
Side 2019 #5Side 2019 #5
Side 2019 #5
 
Machine Learning for Actuaries
Machine Learning for ActuariesMachine Learning for Actuaries
Machine Learning for Actuaries
 
transformations and nonparametric inference
transformations and nonparametric inferencetransformations and nonparametric inference
transformations and nonparametric inference
 

Similaire à Lausanne 2019 #1

Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataArthur Charpentier
 
X02 Supervised learning problem linear regression multiple features
X02 Supervised learning problem linear regression multiple featuresX02 Supervised learning problem linear regression multiple features
X02 Supervised learning problem linear regression multiple featuresMarco Moldenhauer
 
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...inventionjournals
 
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...inventionjournals
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingArthur Charpentier
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
 
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...Umberto Picchini
 
Mathematics and AI
Mathematics and AIMathematics and AI
Mathematics and AIMarc Lelarge
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksFederico Cerutti
 

Similaire à Lausanne 2019 #1 (20)

Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
 
Slides Bank England
Slides Bank EnglandSlides Bank England
Slides Bank England
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 
X02 Supervised learning problem linear regression multiple features
X02 Supervised learning problem linear regression multiple featuresX02 Supervised learning problem linear regression multiple features
X02 Supervised learning problem linear regression multiple features
 
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
 
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Slides ub-3
Slides ub-3Slides ub-3
Slides ub-3
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive Modeling
 
Slides ub-2
Slides ub-2Slides ub-2
Slides ub-2
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
 
Mathematics and AI
Mathematics and AIMathematics and AI
Mathematics and AI
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
AINL 2016: Strijov
AINL 2016: StrijovAINL 2016: Strijov
AINL 2016: Strijov
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

Plus de Arthur Charpentier

Plus de Arthur Charpentier (12)

Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
ACT6100 introduction
ACT6100 introductionACT6100 introduction
ACT6100 introduction
 
Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)
 
Control epidemics
Control epidemics Control epidemics
Control epidemics
 
STT5100 Automne 2020, introduction
STT5100 Automne 2020, introductionSTT5100 Automne 2020, introduction
STT5100 Automne 2020, introduction
 
Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
 
Pareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQPareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQ
 
Econ. Seminar Uqam
Econ. Seminar UqamEcon. Seminar Uqam
Econ. Seminar Uqam
 

Dernier

《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》rnrncn29
 
The AES Investment Code - the go-to counsel for the most well-informed, wise...
The AES Investment Code -  the go-to counsel for the most well-informed, wise...The AES Investment Code -  the go-to counsel for the most well-informed, wise...
The AES Investment Code - the go-to counsel for the most well-informed, wise...AES International
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintSuomen Pankki
 
AnyConv.com__FSS Advance Retail & Distribution - 15.06.17.ppt
AnyConv.com__FSS Advance Retail & Distribution - 15.06.17.pptAnyConv.com__FSS Advance Retail & Distribution - 15.06.17.ppt
AnyConv.com__FSS Advance Retail & Distribution - 15.06.17.pptPriyankaSharma89719
 
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...Amil baba
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Sonam Pathan
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managmentfactical
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfHenry Tapper
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)ECTIJ
 
The Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance LeaderThe Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance LeaderArianna Varetto
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technologyz xss
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfMichael Silva
 
Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Devarsh Vakil
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证jdkhjh
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHenry Tapper
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTGOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTharshitverma1762
 
PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojnaDharmendra Kumar
 
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...Amil baba
 
Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Champak Jhagmag
 

Dernier (20)

《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
 
The AES Investment Code - the go-to counsel for the most well-informed, wise...
The AES Investment Code -  the go-to counsel for the most well-informed, wise...The AES Investment Code -  the go-to counsel for the most well-informed, wise...
The AES Investment Code - the go-to counsel for the most well-informed, wise...
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraint
 
AnyConv.com__FSS Advance Retail & Distribution - 15.06.17.ppt
AnyConv.com__FSS Advance Retail & Distribution - 15.06.17.pptAnyConv.com__FSS Advance Retail & Distribution - 15.06.17.ppt
AnyConv.com__FSS Advance Retail & Distribution - 15.06.17.ppt
 
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managment
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
 
The Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance LeaderThe Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance Leader
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdf
 
Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview document
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTGOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
 
PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojna
 
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
 
Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024
 

Lausanne 2019 #1

  • 1. Use & Value of Unusual Data Arthur Charpentier UQAM Actuarial Summer School 2019 @freakonometrics freakonometrics freakonometrics.hypotheses.org 1 / 57
  • 2. The Team for the Week Jean-Philippe Boucher UQAM (Montr´eal, Canada) Professor, holder of The Co-operators Chair in Actuar- ial Risk Analysis, author of several research articles in actuarial science. @J P Boucher jpboucher.uqam.ca Arthur Charpentier UQAM (Montr´eal, Canada) Professor, editor of Computational Actuarial Science with R. Former director of the Data Science for Actuaries pro- gram of the French Institute of Actuaries @freakonometrics freakonometrics freakonometrics.hypotheses.org Ewen Gallic AMSE (Aix-Marseille, France) Assistant professor, teaching computational economet- rics, data science and machine learning. @3wen 3wen egallic.fr @freakonometrics freakonometrics freakonometrics.hypotheses.org 2 / 57
  • 3. Practical Issues @freakonometrics freakonometrics freakonometrics.hypotheses.org 3 / 57
  • 4. General Context (in Insurance) Networks and P2P Insurance (via https://sharable.life and source) @freakonometrics freakonometrics freakonometrics.hypotheses.org 4 / 57
  • 5. General Context (in Insurance) Networks and P2P Insurance, via Insurance Business (2016) @freakonometrics freakonometrics freakonometrics.hypotheses.org 5 / 57
  • 6. General Context (in Insurance) Telematics and Usage-Based-Insurance (source, source, The Actuary) @freakonometrics freakonometrics freakonometrics.hypotheses.org 6 / 57
  • 7. General Context (in Insurance) Satellite pictures (source and Index-based Flood Insurance (IBFI), http://ibfi.iwmi.org/) @freakonometrics freakonometrics freakonometrics.hypotheses.org 7 / 57
  • 8. General Context (in Insurance) Very small granularity (1 pixel is 15 × 15cm2, see detail) @freakonometrics freakonometrics freakonometrics.hypotheses.org 8 / 57
  • 9. More and More (exponentially) Data 1969, (legend says) 5 photographs were taken by Neil Armstrong More than 800,000,000 pictures are uploaded on every day, on Facebook (2 billion photos per day with instagram, Messenger and Whatsapp) @freakonometrics freakonometrics freakonometrics.hypotheses.org 9 / 57
  • 10. Data Complexity Pictures, drawings, videos, handwrit- ing text, ECG, medical imagery, satel- lite pictures, tweets, phone calls, etc. Hard to distinguish structured and un-structured data... @freakonometrics freakonometrics freakonometrics.hypotheses.org 10 / 57
  • 11. Data Complexity @freakonometrics freakonometrics freakonometrics.hypotheses.org 11 / 57
  • 12. Insurance Data From questionnaires to data frames xi,j row i : insured, policy, claim column j : variables (features) @freakonometrics freakonometrics freakonometrics.hypotheses.org 12 / 57
  • 13. From Pictures to Data Consider pictures saa-iss.ch/testalbum/nggallery/all-old-galleries/ @freakonometrics freakonometrics freakonometrics.hypotheses.org 13 / 57
  • 14. From Pictures to Data Face detection face-api.js (justadudewhohacks.github.io) @freakonometrics freakonometrics freakonometrics.hypotheses.org 14 / 57
  • 15. From Pictures to Data Expression recognition face-api.js (justadudewhohacks.github.io) @freakonometrics freakonometrics freakonometrics.hypotheses.org 15 / 57
  • 16. From Pictures to Data Face recognition face-api.js (justadudewhohacks.github.io) @freakonometrics freakonometrics freakonometrics.hypotheses.org 16 / 57
  • 17. Pictures Extracting information from a picture What is on the picture ? What is the cost of such a claim ? High dimensional data : h × d color picture, e.g. 850 × 1280 xi ∈ Rd with d > 3 nillion Need new statistical tools (deep) neural networks perform well or reduce dimension Pictures Odermatts (2003, Karambolage) @freakonometrics freakonometrics freakonometrics.hypotheses.org 17 / 57
  • 18. Text Kaufman et al. (2016, Natural Language Processing-Enabled and Conventional Data Capture Methods for Input to Electronic Health Records), Chen et al. (2018, A Natural Language Processing System That Links Medical Terms in Electronic Health Record Notes to Lay Definitions) or Simon (2018, Natural Language Processing for Healthcare Customers) @freakonometrics freakonometrics freakonometrics.hypotheses.org 18 / 57
  • 19. Statistics & ML Toolbox : PCA Projection method, used to reduce dimensionality Principcal Component Analysis (PCA) find a small number of directions in input space that explain variation in input data represent data by projecting along those directions n observations in dimension d, stored in matrix X ∈ Mn×d Natural ideal: linearly project (multiply by a projection matrix) to much lower dimensional space k < d Search for orthogonal directions in space with highest variance Information is stored in the covariance matrix Σ = X X when variables are centered @freakonometrics freakonometrics freakonometrics.hypotheses.org 19 / 57
  • 20. Statistics & ML Toolbox : PCA Find the k largest eigenvectors of Σ : principal components Assemble these eigenvectors into a d × k matrix ˜Σ Express d-dimensional vectors x by projecting them to m-dimensional ˜x, ˜x = ˜Σ x We have data X ∈ Mn×d , i.e. n vectors x ∈ Rd , Let ω1 ∈ Rd denote weights. We want to maximize the variance of z = ω1 x max 1 n n i=1 (ω1 xi − ω1 x)2 = max ω1 Σω1 s.t. ω1 = 1 @freakonometrics freakonometrics freakonometrics.hypotheses.org 20 / 57
  • 21. Statistics & ML Toolbox : PCA To solve max 1 n n i=1 (ω1 xi − ω1 x)2 = max ω1 Σω1 s.t. ω1 = 1 we can use Lagrange multiplier to solve this constraint optimization problem max ω1,λ1 ω1 Σω1 + λ(1 − ω1 ω1) If we differentiate Σω1 − λ1ω1 = 0, i.e. Σω1 = λ1ω1 i.e. ω1 is an eigenvector of Σ, with eigenvalue the Lagrange multiplier. It should be the one with the largest eigenvalue. @freakonometrics freakonometrics freakonometrics.hypotheses.org 21 / 57
  • 22. Statistics & ML Toolbox : PCA max ω2 Σω2 subject to ω2 = 1 ω2 ω1 = 0 The Lagragian is ω2 Σω2 + λ2(1 − ω2 ω2) − λ1ω2 ω1 when differentiating, we get Σω2 = λ2ω2 λ2 is the second largest eigenvalue Set z = W x and ˜x = Mz. We want to solve (where is the standard 2 norm) min W ,M n i=1 (xi , ˜xi ) i.e. min W ,M n i=1 (xi , MWxi ) @freakonometrics freakonometrics freakonometrics.hypotheses.org 22 / 57
  • 23. Statistics & ML Toolbox : GLM Generalized Linear Model Flexible generalization of linear regression that allows for y to have distribution other than a normal distribution The classical regression is (matrix form) y n×1 = X n×k β k×1 + ε n×1 or yi = xT i β xi ,β + εi Yi ∼ N(µi , σ2 ) , with µi = E[Yi |xi ] = xT i β It is possible to consider a more general regression. For a classification y ∈ {0, 1}n, Yi ∼ B(pi ) , with pi = E[Yi |xi ] = exT i β 1 + exT i β Recall that we assumed that (Yi , Xi ) were i.i.d. with unknown distribution P. @freakonometrics freakonometrics freakonometrics.hypotheses.org 23 / 57
  • 24. Statistics & ML Toolbox : GLM In the (standard) linear model, we assume that (Y |X = x) ∼ N(x β, σ2). The maximum-likelihood estimator of β is then β = argmax β∈Rp+1 − n i=1 (yi − xi β)2 For the logistic regression, (Y |X = x) ∼ B(logit−1 (x β)). The maximum-likelihood estimator of β is then argmin β∈Rp+1 n i=1 yi log(logit−1 (xi β)) + (1 − yi ) log(1 − logit−1 (xi β)) @freakonometrics freakonometrics freakonometrics.hypotheses.org 24 / 57
  • 25. Statistics & ML Toolbox : loss and risk In a general setting, consider some loss function : R × R → R+, e.g. (y, m) = (y − m)2 (quadratic 2 norm) Average Risk The average risk associated with mn is RP(mn) = EP (Y , mn(X)) @freakonometrics freakonometrics freakonometrics.hypotheses.org 25 / 57
  • 26. Statistics & ML Toolbox : loss and risk In a general setting, consider some loss function : R × R → R+, e.g. (y, m) = (y − m)2 (quadratic 2 norm) Empirical Risk The empirical risk associated with mn is Rn(mn) = 1 n n i=1 yi , mn(xi ) . @freakonometrics freakonometrics freakonometrics.hypotheses.org 26 / 57
  • 27. Statistics & ML Toolbox : loss and risk If we minimize the average risk, we overfit... Hold-Out Cross Validation 1. Split {1, 2, · · · , n} in T (training) and V (validation) 2 . Estimate m on sample (yi , xi ), i ∈ T : mT = argmin m∈M 1 |T| i∈V yi , m(x1,i , · · · , xp,i ) 3. Compute 1 |V | i∈V yi , mT (x1,i , · · · , xp,i ) @freakonometrics freakonometrics freakonometrics.hypotheses.org 27 / 57
  • 28. Statistics & ML Toolbox : loss and risk source: Hastie et al. (2009, The Elements of Statistical Learning) @freakonometrics freakonometrics freakonometrics.hypotheses.org 28 / 57
  • 29. Statistics & ML Toolbox : loss and risk Leave-one-Out Cross Validation 1. Estimate n models : estimate m−j on sample (yi , xi ), i ∈ {1, · · · , n}{j} m−j = argmin m∈M 1 n − 1 i=j yi , m(x1,i , · · · , xp,i ) 2. Compute 1 n n i=1 yi , m−i (xi ) @freakonometrics freakonometrics freakonometrics.hypotheses.org 29 / 57
  • 30. Statistics & ML Toolbox : loss and risk K-Fold Cross Validation 1. Split {1, 2, · · · , n} in K groups V1, · · · , VK 2. Estimate K models : estinate mk on sample (yi , xi ), i ∈ {1, · · · , n}Vk 3. Compute 1 K K k=1 1 |Vk| i∈Vk yi , mk(x1,i , · · · , Xp,i ) @freakonometrics freakonometrics freakonometrics.hypotheses.org 30 / 57
  • 31. Statistics & ML Toolbox : loss and risk Bootstrap Cross Validation 1. Generate B boostrap samples from {1, 2, · · · , n}, I1, · · · , IB 2. Estimate B models : mb on sample (yi , xi ), i ∈ Ib 3. Compute 1 B B b=1 1 n − |Ib| i∈Ib yi , mb(x1,i , · · · , Xp,i ) The probability that i /∈ Ib is 1 − 1 n n ∼ e−1 (= 36.78%) At stage b, we validate on ∼ 36.78% of the dataset. @freakonometrics freakonometrics freakonometrics.hypotheses.org 31 / 57
  • 32. Statistics & ML Toolbox : loss and risk y ∈ Rd , y = argmin m∈R    n i=1 1 n yi − m εi 2    is the empirical version of E[Y ] = argmin m∈R    y − m ε 2 dF(y)    = argmin m∈R    E Y − m ε 2    where Y is a random variable. Thus, argmin m:Rk →R    n i=1 1 n yi − m(xi ) εi 2    is the empirical version of E[Y |X = x]. See Legendre (1805, Nouvelles m´ethodes pour la d´etermination des orbites des com`etes) and Gauβ (1809, Theoria motus corporum coelestium in sectionibus conicis solem ambientium). @freakonometrics freakonometrics freakonometrics.hypotheses.org 32 / 57
  • 33. Statistics & ML Toolbox : loss and risk med[y] ∈ argmin m∈R    n i=1 1 n yi − m εi    is the empirical version of med[Y ] ∈ argmin m∈R    y − m ε dF(y)    = argmin m∈R    E Y − m ε 1    where P[Y ≤ med[Y ]] ≥ 1 2 and P[Y ≥ med[Y ]] ≥ 1 2 . argmin m:Rk →R    n i=1 1 n yi − m(xi ) εi    is the empirical version of med[Y |X = x]. See Boscovich (1757, De Litteraria expeditione per pontificiam ditionem ad dimetiendos duos meridiani ) and Laplace (1793, Sur quelques points du syst`eme du monde). @freakonometrics freakonometrics freakonometrics.hypotheses.org 33 / 57
  • 34. Statistics & ML Toolbox : loss and risk For least-squares ( 2 norm), we have a strictly convex problem... classical optimization routines can be used (e.g. gradient descient) More complicated with the 1 norm Consider a sample {y1, · · · , yn}. To compute the median, solve min µ n i=1 |yi − µ| which can be solved using linear programming techniques., see Dantzig (1963, Linear programming). More precisely, this problem is equivalent to min µ,a,b n i=1 ai + bi with ai , bi ≥ 0 and yi − µ = ai − bi , ∀i = 1, · · · , n. @freakonometrics freakonometrics freakonometrics.hypotheses.org 34 / 57
  • 35. Statistics & ML Toolbox : loss and risk For a quantile, the linear program is min µ,a,b n i=1 τai + (1 − τ)bi with ai , bi ≥ 0 and yi − µ = ai − bi , ∀i = 1, · · · , n. This is extended to the quantile regression, min βτ ,a,b n i=1 τai + (1 − τ)bi with ai , bi ≥ 0 and yi − xi βτ ] = ai − bi , ∀i = 1, · · · , n. But is the following problem the one we should really care about ? m = argmin m∈M n i=1 yi , m(xi ) @freakonometrics freakonometrics freakonometrics.hypotheses.org 35 / 57
  • 36. Statistics & ML Toolbox : Penalization Usually, there are tuning parameters p. To avoid overfit, 1. use cross-validation mp = argmin p    i∈validation yi , mp(xi )    where mp = argmin m∈Mp    i∈training yi , m(xi )    2. use penalization m = argmin m∈M n i=1 yi , m(xi ) + penalization(m) @freakonometrics freakonometrics freakonometrics.hypotheses.org 36 / 57
  • 37. Statistics & ML Toolbox : Penalization Penalization often yield bias... but it might interesting if the risk is the mean squared error... Consider a parametric model, with true (unknown) parameter θ, then mse(ˆθ) = E (ˆθ − θ)2 = E (ˆθ − E ˆθ )2 variance + E (E ˆθ − θ)2 bias2 One can think of a shrinkage of an unbiased estimator, Let θ denote an unbiased estimator of θ, ˆθ = θ2 θ2 + mse(θ) · θ ≤ θ satisfies mse(ˆθ) ≤ mse(θ). @freakonometrics freakonometrics freakonometrics.hypotheses.org 37 / 57 −2 −1 0 1 2 3 4 0.00.20.40.60.8 variance
  • 38. Statistics & ML Toolbox : Ridge as in Wieringen (2018 Lecture notes on ridge regression) Ridge Estimator (OLS) β ridge λ = argmin    n i=1 (yi − xi β)2 + λ p j=1 β2 j    β ridge λ = (XT X + λI)−1 XT y Ridge Estimator (GLM) β ridge λ = argmin    − n i=1 log f (yi |µi = g−1 (xi β)) + λ 2 p j=1 β2 j    @freakonometrics freakonometrics freakonometrics.hypotheses.org 38 / 57
  • 39. Statistics & ML Toolbox : Ridge Observe that if xj1 ⊥ xj2 , then β ridge λ = [1 + λ]−1 β ols which explain relationship with shrinkage. But not in the general case... q q Smaller mse There exists λ such that mse[β ridge λ ] ≤ mse[β ols ] @freakonometrics freakonometrics freakonometrics.hypotheses.org 39 / 57
  • 40. Statistics & ML Toolbox : Ridge From a Bayesian perspective, P[θ|y] posterior ∝ P[y|θ] likelihood · P[θ] prior i.e. log P[θ|y] = log P[y|θ] log likelihood + log P[θ] penalty If β has a prior N(0, τ2I) distribution, then its posterior distribution has mean E[β|y, X] = XT X + σ2 τ2 I −1 XT y. @freakonometrics freakonometrics freakonometrics.hypotheses.org 40 / 57
  • 41. Statistics & ML Toolbox : lasso lasso Estimator (OLS) β lasso λ = argmin    n i=1 (yi − xi β)2 + λ p j=1 |βj|    lasso Estimator (GLM) β lasso λ = argmin    − n i=1 log f (yi |µi = g−1 (xi β)) + λ 2 p j=1 |βj|    @freakonometrics freakonometrics freakonometrics.hypotheses.org 41 / 57
  • 42. Statistics & ML Toolbox : sparcity In several applications, k can be (very) large, but a lot of features are just noise: βj = 0 for many j’s. Let s denote the number of relevant features, with s << k, cf Hastie, Tibshirani & Wainwright (2015, Statistical Learning with Sparsity), s = card{S} where S = {j; βj = 0} The model is now y = XT SβS + ε, where XT SXS is a full rank matrix. @freakonometrics freakonometrics freakonometrics.hypotheses.org 42 / 57 q q = . +
  • 43. Statistics & ML Toolbox : sparcity 0-regularization Estimator (OLS) β lasso λ = argmin n i=1 (yi − xi β)2 + λ β 0 where a 0 = 1(|aj| > 0) = dim(β). More generally p-regularization Estimator (OLS) β lasso λ = argmin n i=1 (yi − xi β)2 + λ β p • sparsity is obtained when p ≤ 1 • convexity is obtained when p ≥ 1 @freakonometrics freakonometrics freakonometrics.hypotheses.org 43 / 57
  • 44. Statistics & ML Toolbox : sparcity source: Hastie et al. (2009, The Elements of Statistical Learning) @freakonometrics freakonometrics freakonometrics.hypotheses.org 44 / 57
  • 45. Statistics & ML Toolbox : lasso In the orthogonal case, XT X = I, β lasso k,λ = sign(β ols k ) |β ols k | − λ 2 i.e. the lasso estimate is related to the soft threshold function... q q @freakonometrics freakonometrics freakonometrics.hypotheses.org 45 / 57
  • 46. Numerical Toolbox LASSO Coordinate Descent Algorithm 1. Set β0 = β 2 . For k = 1, · · · for j = 1, · · · , p (i) compute Rj = xj y−X−jβk−1(−j) (ii) set βk,j = Rj · 1 − λ 2|Rj| + 3. The final estimate βκ is βλ The softmax function log(1+ex−s) can be used as a smooth version of (x − s)+ @freakonometrics freakonometrics freakonometrics.hypotheses.org 46 / 57
  • 47. Statistics & ML Toolbox : trees Use recursive binary splitting to grow a tree Trees (CART) Each interior node corresponds to one of the input variables; there are edges to children for each of the possible values of that input variable. Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf. @freakonometrics freakonometrics freakonometrics.hypotheses.org 47 / 57
  • 48. Statistics & ML Toolbox : forests Bagging (Bootstrap + Aggregation) 1. For k = 1, · · · (i) draw a bootstrap sample from (yi , xi )’s (ii) estimate a model mk on that sample 2. The final model is m (·) = 1 κ κ k=1 mk(·) @freakonometrics freakonometrics freakonometrics.hypotheses.org 48 / 57
  • 49. Statistics & ML Toolbox : boosting Bosting & Sequential Learning mk(·) = mk−1(·) + argmin h∈H    n i=1 yi − mk−1(xi ) εi , h(xi )    @freakonometrics freakonometrics freakonometrics.hypotheses.org 49 / 57
  • 50. Statistics & ML Toolbox : Neural Nets Very popular for pictures... Picture xi is • a n×n matrix in {0, 1}n2 for black & white • a n × n matrix in [0, 1]n2 for grey-scale • a 3 × n × n array in ([0, 1]3)n2 for color • a T ×3×n ×n tensor in (([0, 1]3)T )n2 for video y here is the label (“8”, “9”, “6”, etc) Suppose we want to recognize a “6” on a picture m(x) = +1 if x is a “6” −1 otherwise @freakonometrics freakonometrics freakonometrics.hypotheses.org 50 / 57
  • 51. Statistics & ML Toolbox : Neural Nets Consider some specifics pixels, and associate weights ω such that m(x) = sign   i,j ωi,jxi,j   where xi,j = +1 if pixel xi,j is black −1 if pixel xi,j is white for some weights ωi,j (that can be nega- tive...) @freakonometrics freakonometrics freakonometrics.hypotheses.org 51 / 57
  • 52. Statistics & ML Toolbox : Neural Nets A deep network is a network with a lot of layers m(x) = sign i ωi mi (x) where mi ’s are outputs of previous neural nets Those layers can capture shapes in some ar- eas nonlinearities, cross-dependence, etc @freakonometrics freakonometrics freakonometrics.hypotheses.org 52 / 57
  • 53. Binary Threshold Neuron & Perceptron If x ∈ {0, 1}p, McCulloch & Pitts (1943, A logical calculus of the ideas immanent in nervous activity) suggested a simple model, with threshold b yi = f   p j=1 xj,i   where f (x) = 1(x ≥ b) where 1(x ≥ b) = +1 if x ≥ b and 0 otherwise, or (equivalently) yi = f  ω + p j=1 xj,i   where f (x) = 1(x ≥ 0) with weight ω = −b. The trick of adding 1 as an input was very important ! ω = −1 is the or logical operator : yi = 1 if ∃j such that xj,i = 1 ω = −p, is the and logical operator : yi = 1 if ∀j, xj,i = 1 @freakonometrics freakonometrics freakonometrics.hypotheses.org 53 / 57
  • 54. Binary Threshold Neuron & Perceptron but not possible for the xor logical operator : yi = 1 if x1,i = x2,i Rosenblatt (1961, Principles of Neurodynamics: Perceptron & Theory of Brain Mechanism) considered the extension where x’s are real-valued, with weight ω ∈ Rp yi = f   p j=1 ωjxj,i   where f (x) = 1(x ≥ b) where 1(x ≥ b) = +1 if x ≥ b and 0 otherwise, or (equiv- alently) yi = f  ω0 + p j=1 ωjxj,i   where f (x) = 1(x ≥ 0) with weights ω ∈ Rp+1 @freakonometrics freakonometrics freakonometrics.hypotheses.org 54 / 57
  • 55. Binary Threshold Neuron & Perceptron Minsky & Papert (1969, Perceptrons: an Introduction to Computational Geometry) proved that perceptron were a linear separator, not very powerful Define the sigmoid function f (x) = 1 1 + e−x = ex ex + 1 (= logistic function). This function f is called the activation function. If y ∈ {−1, +1}, one can consider the hyperbolic tangent f (x)) = (ex − e−x ) (ex + e−x ) or the inverse tangent function (see wikipedia). @freakonometrics freakonometrics freakonometrics.hypotheses.org 55 / 57
  • 56. Statistics & ML Toolbox : Neural Nets So here, for a classication problem, yi = f  ω0 + p j=1 ωjxj,i   = p(xi ) The error of that model can be com- puted used the quadratic loss function, n i=1 (yi − p(xi ))2 or cross-entropy n i=1 (yi log p(xi ) + [1 − yi ] log[1 − p(xi )]) @freakonometrics freakonometrics freakonometrics.hypotheses.org 56 / 57
  • 57. Statistics & ML Toolbox : Neural Nets Consider a single hidden layer, with 3 different neurons, so that p(x) = f  ω0 + 3 h=1 ωhfh  ωh,0 + p j=1 ωh,jxj     or p(x) = f ω0 + 3 h=1 ωhfh ωh,0 + x ωh @freakonometrics freakonometrics freakonometrics.hypotheses.org 57 / 57