SlideShare une entreprise Scribd logo
1  sur  64
Télécharger pour lire hors ligne
A short introduction to statistical learning 
Nathalie Villa-Vialaneix 
nathalie.villa@toulouse.inra.fr 
http://www.nathalievilla.org 
Axe “Apprentissage et Processus” 
October 15th, 2014 - Unité MIA-T, INRA, Toulouse 
Nathalie Villa-Vialaneix | Introduction to statistical learning 1/25
Outline 
1 Introduction 
Background and notations 
Underfitting / Overfitting 
Consistency 
2 SVM 
Nathalie Villa-Vialaneix | Introduction to statistical learning 2/25
Outline 
1 Introduction 
Background and notations 
Underfitting / Overfitting 
Consistency 
2 SVM 
Nathalie Villa-Vialaneix | Introduction to statistical learning 3/25
Background 
Purpose: predict Y from X; 
Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
Background 
Purpose: predict Y from X; 
What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); 
Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
Background 
Purpose: predict Y from X; 
What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); 
What we want: estimate unknown Y from new X: xn+1, . . . , xm. 
Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
Background 
Purpose: predict Y from X; 
What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); 
What we want: estimate unknown Y from new X: xn+1, . . . , xm. 
X can be: 
numeric variables; 
or factors; 
or a combination of numeric variables and factors. 
Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
Background 
Purpose: predict Y from X; 
What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); 
What we want: estimate unknown Y from new X: xn+1, . . . , xm. 
X can be: 
numeric variables; 
or factors; 
or a combination of numeric variables and factors. 
Y can be: 
a numeric variable (Y 2 R) ) (supervised) regression régression; 
a factor ) (supervised) classification discrimination. 
Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
Basics 
From (xi ; yi)i , definition of a machine, n s.t.: 
^ynew = n(xnew): 
Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
Basics 
From (xi ; yi)i , definition of a machine, n s.t.: 
^ynew = n(xnew): 
if Y is numeric, n is called a regression function fonction de 
classification; 
if Y is a factor, n is called a classifier classifieur; 
Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
Basics 
From (xi ; yi)i , definition of a machine, n s.t.: 
^ynew = n(xnew): 
if Y is numeric, n is called a regression function fonction de 
classification; 
if Y is a factor, n is called a classifier classifieur; 
n is said to be trained or learned from the observations (xi ; yi)i . 
Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
Basics 
From (xi ; yi)i , definition of a machine, n s.t.: 
^ynew = n(xnew): 
if Y is numeric, n is called a regression function fonction de 
classification; 
if Y is a factor, n is called a classifier classifieur; 
n is said to be trained or learned from the observations (xi ; yi)i . 
Desirable properties 
accuracy to the observations: predictions made on known data are 
close to observed values; 
Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
Basics 
From (xi ; yi)i , definition of a machine, n s.t.: 
^ynew = n(xnew): 
if Y is numeric, n is called a regression function fonction de 
classification; 
if Y is a factor, n is called a classifier classifieur; 
n is said to be trained or learned from the observations (xi ; yi)i . 
Desirable properties 
accuracy to the observations: predictions made on known data are 
close to observed values; 
generalization ability: predictions made on new data are also 
accurate. 
Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
Basics 
From (xi ; yi)i , definition of a machine, n s.t.: 
^ynew = n(xnew): 
if Y is numeric, n is called a regression function fonction de 
classification; 
if Y is a factor, n is called a classifier classifieur; 
n is said to be trained or learned from the observations (xi ; yi)i . 
Desirable properties 
accuracy to the observations: predictions made on known data are 
close to observed values; 
generalization ability: predictions made on new data are also 
accurate. 
Conflicting objectives!! 
Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
Underfitting/Overfitting sous/sur - apprentissage 
Function x ! y to be estimated 
Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
Underfitting/Overfitting sous/sur - apprentissage 
Observations we might have 
Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
Underfitting/Overfitting sous/sur - apprentissage 
Observations we do have 
Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
Underfitting/Overfitting sous/sur - apprentissage 
First estimation from the observations: underfitting 
Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
Underfitting/Overfitting sous/sur - apprentissage 
Second estimation from the observations: accurate estimation 
Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
Underfitting/Overfitting sous/sur - apprentissage 
Third estimation from the observations: overfitting 
Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
Underfitting/Overfitting sous/sur - apprentissage 
Summary 
Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
Errors 
training error (measures the accuracy to the observations) 
Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
Errors 
training error (measures the accuracy to the observations) 
I if y is a factor: misclassification rate 
]f^yi , yi ; i = 1; : : : ; ng 
n 
Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
Errors 
training error (measures the accuracy to the observations) 
I if y is a factor: misclassification rate 
]f^yi , yi ; i = 1; : : : ; ng 
n 
I if y is numeric: mean square error (MSE) 
1 
n 
Xn 
i=1 
(^yi  yi)2 
Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
Errors 
training error (measures the accuracy to the observations) 
I if y is a factor: misclassification rate 
]f^yi , yi ; i = 1; : : : ; ng 
n 
I if y is numeric: mean square error (MSE) 
1 
n 
Xn 
i=1 
(^yi  yi)2 
or root mean square error (RMSE) or pseudo-R2: 1MSE=Var((yi)i) 
Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
Errors 
training error (measures the accuracy to the observations) 
I if y is a factor: misclassification rate 
]f^yi , yi ; i = 1; : : : ; ng 
n 
I if y is numeric: mean square error (MSE) 
1 
n 
Xn 
i=1 
(^yi  yi)2 
or root mean square error (RMSE) or pseudo-R2: 1MSE=Var((yi)i) 
test error: a way to prevent overfitting (estimates the generalization 
error) is the simple validation 
Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
Errors 
training error (measures the accuracy to the observations) 
I if y is a factor: misclassification rate 
]f^yi , yi ; i = 1; : : : ; ng 
n 
I if y is numeric: mean square error (MSE) 
1 
n 
Xn 
i=1 
(^yi  yi)2 
or root mean square error (RMSE) or pseudo-R2: 1MSE=Var((yi)i) 
test error: a way to prevent overfitting (estimates the generalization 
error) is the simple validation 
1 split the data into training/test sets (usually 80%/20%) 
2 train n from the training dataset 
3 calculate the test error from the remaining data 
Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
Example 
Observations 
Nathalie Villa-Vialaneix | Introduction to statistical learning 8/25
Example 
Training/Test datasets 
Nathalie Villa-Vialaneix | Introduction to statistical learning 8/25
Example 
Training/Test errors 
Nathalie Villa-Vialaneix | Introduction to statistical learning 8/25
Example 
Summary 
Nathalie Villa-Vialaneix | Introduction to statistical learning 8/25
Consistency in the parametric/non parametric case 
Example in the parametric framework (linear methods) 
an assumption is made on the form of the relation between X and Y: 
Y =
TX +
is estimated from the observations (x1; y1), . . . , (xn; yn) by a given 
method which calculates a
n. 
The estimation is said to be consistent if
n n!+1 
!
under (eventually) 
technical assumptions on X, , Y. 
Nathalie Villa-Vialaneix | Introduction to statistical learning 9/25
Consistency in the parametric/non parametric case 
Example in the nonparametric framework 
the form of the relation between X and Y is unknown: 
Y = (X) +  
 is estimated from the observations (x1; y1), . . . , (xn; yn) by a given 
method which calculates a n. 
The estimation is said to be consistent if n n!+1 
!  under (eventually) 
technical assumptions on X, , Y. 
Nathalie Villa-Vialaneix | Introduction to statistical learning 9/25
Consistency from the statistical learning perspective 
[Vapnik, 1995] 
Question: Are we really interested in estimating  or... 
Nathalie Villa-Vialaneix | Introduction to statistical learning 10/25
Consistency from the statistical learning perspective 
[Vapnik, 1995] 
Question: Are we really interested in estimating  or... 
... rather in having the smallest prediction error? 
Statistical learning perspective: a method that builds a machine n from 
the observations is said to be (universally) consistent if, given a risk 
function R : R  R ! R+ (which calculates an error), 
E (R(n(X); Y)) 
n!+1 
! inf 
:X!R 
E (R((X); Y)) ; 
for any distribution of (X; Y) 2 X  R. 
Definitions: L = inf:X!R E (R((X); Y)) and L = E (R((X); Y)). 
Nathalie Villa-Vialaneix | Introduction to statistical learning 10/25
Desirable properties from a mathematical perspective 
Simplified framework: X 2 X and Y 2 f1; 1g (binary classification) 
Learning process: choose a machine n in a class of functions 
C  f : X ! Rg (e.g., C is the set of all functions that can be build using a 
SVM). 
Error decomposition 
Ln  L  
 
Ln  inf 
2C 
L 
 
+ 
 
inf 
2C 
L  L 
 
with 
inf2C L  L is the richness of C (i.e., C must be rich to ensure that 
this term is small); 
Nathalie Villa-Vialaneix | Introduction to statistical learning 11/25
Desirable properties from a mathematical perspective 
Simplified framework: X 2 X and Y 2 f1; 1g (binary classification) 
Learning process: choose a machine n in a class of functions 
C  f : X ! Rg (e.g., C is the set of all functions that can be build using a 
SVM). 
Error decomposition 
Ln  L  
 
Ln  inf 
2C 
L 
 
+ 
 
inf 
2C 
L  L 
 
with 
inf2C L  L is the richness of C (i.e., C must be rich to ensure that 
this term is small); 
Ln  inf2C L  2 sup2C jLn  Lj, Ln = 1 
n 
Pni 
=1 R((xi); yi) is 
the generalization capability of C (i.e., in the worst case, the empirical 
error must be close to the true error: C must not be too rich to ensure 
that this term is small). 
Nathalie Villa-Vialaneix | Introduction to statistical learning 11/25
Outline 
1 Introduction 
Background and notations 
Underfitting / Overfitting 
Consistency 
2 SVM 
Nathalie Villa-Vialaneix | Introduction to statistical learning 12/25
Basic introduction 
Binary classification problem: X 2 H et Y 2 f1; 1g 
A training set is given: (x1; y1); : : : ; (xn; yn) 
Nathalie Villa-Vialaneix | Introduction to statistical learning 13/25
Basic introduction 
Binary classification problem: X 2 H et Y 2 f1; 1g 
A training set is given: (x1; y1); : : : ; (xn; yn) 
SVM is a method based on kernels. It is universally consistent method, 
given that the kernel is universal [Steinwart, 2002]. 
Extensions to the regression case exist (SVR or LS-SVM) that are also 
universally consistent when the kernel is universal. 
Nathalie Villa-Vialaneix | Introduction to statistical learning 13/25
Optimal margin classification 
Nathalie Villa-Vialaneix | Introduction to statistical learning 14/25
Optimal margin classification 
Nathalie Villa-Vialaneix | Introduction to statistical learning 14/25
Optimal margin classification 
w 
margin: 1 
kwk2 
Support Vector 
Nathalie Villa-Vialaneix | Introduction to statistical learning 14/25
Optimal margin classification 
w 
margin: 1 
kwk2 
Support Vector 
w is chosen such that: 
minw kwk2 (the margin is the largest), 
under the constraints: yi(hw; xii + b)  1; 1  i  n (the separation 
between the two classes is perfect). 
) ensures a good generalization capability. 
Nathalie Villa-Vialaneix | Introduction to statistical learning 14/25
Soft margin classification 
Nathalie Villa-Vialaneix | Introduction to statistical learning 15/25
Soft margin classification 
Nathalie Villa-Vialaneix | Introduction to statistical learning 15/25
Soft margin classification 
w 
margin: 1 
kwk2 
Support Vector 
Nathalie Villa-Vialaneix | Introduction to statistical learning 15/25
Soft margin classification 
w 
margin: 1 
kwk2 
Support Vector 
w is chosen such that: 
minw; kwk2 + C 
Pni 
=1 i (the margin is the largest), 
under the constraints: yi(hw; xii + b)  1  i ; 1  i  n; 
i  0; 1  i  n: 
(the separation between the two classes is almost perfect). 
) allowing a few errors improves the richness of the class. 
Nathalie Villa-Vialaneix | Introduction to statistical learning 15/25
Non linear SVM 
Original space X 
Nathalie Villa-Vialaneix | Introduction to statistical learning 16/25
Non linear SVM 
Original space X Feature space H 
	 (non linear) 
Nathalie Villa-Vialaneix | Introduction to statistical learning 16/25
Non linear SVM 
Original space X Feature space H 
	 (non linear) 
Nathalie Villa-Vialaneix | Introduction to statistical learning 16/25
Non linear SVM 
Original space X Feature space H 
	 (non linear) 
w 2 H is chosen such that (PC;H): 
minw; kwk2 
H + C 
Pni 
=1 i (the margin in the feature space is the 
largest), 
under the constraints: yi(hw; 	(xi)iH + b)  1  i ; 1  i  n; 
i  0; 1  i  n: 
(the separation between the two classes in the feature space is 
almost perfect). 
Nathalie Villa-Vialaneix | Introduction to statistical learning 16/25
SVM from different points of view 
A regularization problem: (PC;H) , 
(P2 
;H) : min 
w2H 
1 
n 
Xn 
i=1 
R(fw(xi); yi) 
|                        {z                        } 
error term 
+ kwk2 
|{zH} 
; 
penalization term 
where fw(x) = h	(x);wiH and R(^y; y) = max(0; 1  ^yy) (hinge loss 
function) 
errors versus ^y for y = 1: 
I blue: hinge loss; 
I green: misclassification error. 
Nathalie Villa-Vialaneix | Introduction to statistical learning 17/25
SVM from different points of view 
A regularization problem: (PC;H) , 
(P2 
;H) : min 
w2H 
1 
n 
Xn 
i=1 
R(fw(xi); yi) 
|                        {z                        } 
error term 
+ kwk2 
|{zH} 
; 
penalization term 
where fw(x) = h	(x);wiH and R(^y; y) = max(0; 1  ^yy) (hinge loss 
function) 
A dual problem: (PC;H) , 
(DC;X) : max2Rn 
Pni 
=1 i  
Pni 
=1 
Pnj 
=1 ijyiyjh	(xi); 	(xj)iH; 
with 
PNi 
=1 iyi = 0; 
0  i  C; 1  i  n: 
Nathalie Villa-Vialaneix | Introduction to statistical learning 17/25
SVM from different points of view 
A regularization problem: (PC;H) , 
(P2 
;H) : min 
w2H 
1 
n 
Xn 
i=1 
R(fw(xi); yi) 
|                        {z                        } 
error term 
+ kwk2 
|{zH} 
; 
penalization term 
where fw(x) = h	(x);wiH and R(^y; y) = max(0; 1  ^yy) (hinge loss 
function) 
A dual problem: (PC;H) , 
(DC;X) : max2Rn 
Pni 
=1 i  
Pni 
=1 
Pnj 
=1 ijyiyjK(xi ; xj); 
with 
PNi 
=1 iyi = 0; 
0  i  C; 1  i  n: 
There is no need to know 	 and H: 
I choose a function K with a few good properties; 
I use it as the dot product in H: 
8 u; v 2 H; K(u; v) = h	(u); 	(v)iH. 
Nathalie Villa-Vialaneix | Introduction to statistical learning 17/25
Which kernels? 
Minimum properties that a kernel should fulfilled 
symmetry: K(u; u0) = K(u0; u) 
positivity: 8N 2 N, 8 (i)  RN, 8 (xi)  XN, 
P 
i;j ijK(xi ; xj)  0. 
[Aronszajn, 1950]: 9 a Hilbert space (H; h:; :iH) and a function 	 : X ! H 
such that: 
8 u; v 2 H; K(u; v) = h	(u); 	(v)iH 
Nathalie Villa-Vialaneix | Introduction to statistical learning 18/25
Which kernels? 
Minimum properties that a kernel should fulfilled 
symmetry: K(u; u0) = K(u0; u) 
positivity: 8N 2 N, 8 (i)  RN, 8 (xi)  XN, 
P 
i;j ijK(xi ; xj)  0. 
[Aronszajn, 1950]: 9 a Hilbert space (H; h:; :iH) and a function 	 : X ! H 
such that: 
8 u; v 2 H; K(u; v) = h	(u); 	(v)iH 
Examples 
the Gaussian kernel: 8 x; x0 2 Rd, K(x; x0) = e
kxx0k2 (it is universal 
for all bounded subset of Rd); 
the linear kernel: 8 x; x0 2 Rd, K(x; x0) = xT (x0) (it is not universal). 
Nathalie Villa-Vialaneix | Introduction to statistical learning 18/25
In summary, how does the solution write???? 
n(x) = 
X 
i 
iyiK(xi ; x) 
where only a few i , 0. i such that i , 0 are the support vectors! 
Nathalie Villa-Vialaneix | Introduction to statistical learning 19/25
I’m almost dead with all these stuffs on my mind!!! 
What in practice? 
data(iris) 
iris - iris[iris$Species%in%c(versicolor,virginica),] 
plot(iris$Petal.Length , iris$Petal.Width , col=iris$Species , 
pch=19) 
legend(topleft, pch=19, col=c(2,3), 
legend=c(versicolor, virginica)) 
Nathalie Villa-Vialaneix | Introduction to statistical learning 20/25

Contenu connexe

Tendances

When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?Andrea Dal Pozzolo
 
Nonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer VisionNonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer Visionzukun
 
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Christian Robert
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forestsChristian Robert
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep diveabulyomon
 
Simple linear regression project
Simple linear regression projectSimple linear regression project
Simple linear regression projectJAPAN SHAH
 
How mathematicians predict the future?
How mathematicians predict the future?How mathematicians predict the future?
How mathematicians predict the future?Mattia Zanella
 
Statr session 25 and 26
Statr session 25 and 26Statr session 25 and 26
Statr session 25 and 26Ruru Chowdhury
 

Tendances (11)

When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?
 
Nonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer VisionNonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer Vision
 
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep dive
 
04 regression
04 regression04 regression
04 regression
 
Talk 4
Talk 4Talk 4
Talk 4
 
Simple linear regression project
Simple linear regression projectSimple linear regression project
Simple linear regression project
 
How mathematicians predict the future?
How mathematicians predict the future?How mathematicians predict the future?
How mathematicians predict the future?
 
Statr session 25 and 26
Statr session 25 and 26Statr session 25 and 26
Statr session 25 and 26
 

En vedette

Stanford Statistical Learning
Stanford Statistical LearningStanford Statistical Learning
Stanford Statistical LearningKurt Holst
 
Statistical learning
Statistical learningStatistical learning
Statistical learningSlideshare
 
Stanford - Statistical Learning
Stanford - Statistical LearningStanford - Statistical Learning
Stanford - Statistical LearningRavi Sankar Varma
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOtuxette
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 
Statistical learning
Statistical learningStatistical learning
Statistical learningSlideshare
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
The First English-Persian statistical machine translation
The First English-Persian statistical machine translationThe First English-Persian statistical machine translation
The First English-Persian statistical machine translationMahsa Mohaghegh
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOtuxette
 
Visualiser et fouiller des réseaux - Méthodes et exemples dans R
Visualiser et fouiller des réseaux - Méthodes et exemples dans RVisualiser et fouiller des réseaux - Méthodes et exemples dans R
Visualiser et fouiller des réseaux - Méthodes et exemples dans Rtuxette
 
Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014tuxette
 
Mining co-expression network
Mining co-expression networkMining co-expression network
Mining co-expression networktuxette
 
lec21.ppt
lec21.pptlec21.ppt
lec21.pptbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
解決正確的問題 - 如何讓數據發揮影響力?
解決正確的問題 - 如何讓數據發揮影響力?解決正確的問題 - 如何讓數據發揮影響力?
解決正確的問題 - 如何讓數據發揮影響力?Pei-shen (James) Wu
 

En vedette (20)

Stanford Statistical Learning
Stanford Statistical LearningStanford Statistical Learning
Stanford Statistical Learning
 
Statistical learning
Statistical learningStatistical learning
Statistical learning
 
Statistical learning intro
Statistical learning introStatistical learning intro
Statistical learning intro
 
Stanford - Statistical Learning
Stanford - Statistical LearningStanford - Statistical Learning
Stanford - Statistical Learning
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSO
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Statistical learning
Statistical learningStatistical learning
Statistical learning
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
The First English-Persian statistical machine translation
The First English-Persian statistical machine translationThe First English-Persian statistical machine translation
The First English-Persian statistical machine translation
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSO
 
Visualiser et fouiller des réseaux - Méthodes et exemples dans R
Visualiser et fouiller des réseaux - Méthodes et exemples dans RVisualiser et fouiller des réseaux - Méthodes et exemples dans R
Visualiser et fouiller des réseaux - Méthodes et exemples dans R
 
Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014
 
Mining co-expression network
Mining co-expression networkMining co-expression network
Mining co-expression network
 
Chapter 01
Chapter 01Chapter 01
Chapter 01
 
lec21.ppt
lec21.pptlec21.ppt
lec21.ppt
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
解決正確的問題 - 如何讓數據發揮影響力?
解決正確的問題 - 如何讓數據發揮影響力?解決正確的問題 - 如何讓數據發揮影響力?
解決正確的問題 - 如何讓數據發揮影響力?
 

Similaire à Introduction to Statistical Learning

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learningtuxette
 
An introduction to neural networks
An introduction to neural networksAn introduction to neural networks
An introduction to neural networkstuxette
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural networktuxette
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIRtuxette
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIRtuxette
 
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian MethodsNUI Galway
 
2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical Analysis2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical AnalysisNUI Galway
 
An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...Vasileios Lampos
 
RuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML
 
FDA and Statistical learning theory
FDA and Statistical learning theoryFDA and Statistical learning theory
FDA and Statistical learning theorytuxette
 
advanced_statistics.pdf
advanced_statistics.pdfadvanced_statistics.pdf
advanced_statistics.pdfGerryMakilan2
 
Multiple comparison problem
Multiple comparison problemMultiple comparison problem
Multiple comparison problemJiri Haviger
 
Les outils de modélisation des Big Data
Les outils de modélisation des Big DataLes outils de modélisation des Big Data
Les outils de modélisation des Big DataKezhan SHI
 
Course Title: Introduction to Machine Learning, Chapter 2- Supervised Learning
Course Title: Introduction to Machine Learning,  Chapter 2- Supervised LearningCourse Title: Introduction to Machine Learning,  Chapter 2- Supervised Learning
Course Title: Introduction to Machine Learning, Chapter 2- Supervised LearningShumet Tadesse
 
Eco550 Assignment 1
Eco550 Assignment 1Eco550 Assignment 1
Eco550 Assignment 1Lisa Kennedy
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..butest
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity datatuxette
 

Similaire à Introduction to Statistical Learning (20)

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
An introduction to neural networks
An introduction to neural networksAn introduction to neural networks
An introduction to neural networks
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural network
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
 
2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical Analysis2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical Analysis
 
An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...
 
Classification
ClassificationClassification
Classification
 
RuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative Systems
 
FDA and Statistical learning theory
FDA and Statistical learning theoryFDA and Statistical learning theory
FDA and Statistical learning theory
 
advanced_statistics.pdf
advanced_statistics.pdfadvanced_statistics.pdf
advanced_statistics.pdf
 
Multiple comparison problem
Multiple comparison problemMultiple comparison problem
Multiple comparison problem
 
Les outils de modélisation des Big Data
Les outils de modélisation des Big DataLes outils de modélisation des Big Data
Les outils de modélisation des Big Data
 
Course Title: Introduction to Machine Learning, Chapter 2- Supervised Learning
Course Title: Introduction to Machine Learning,  Chapter 2- Supervised LearningCourse Title: Introduction to Machine Learning,  Chapter 2- Supervised Learning
Course Title: Introduction to Machine Learning, Chapter 2- Supervised Learning
 
Eco550 Assignment 1
Eco550 Assignment 1Eco550 Assignment 1
Eco550 Assignment 1
 
2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...
2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...
2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..
 
Bayesian_Decision_Theory-3.pdf
Bayesian_Decision_Theory-3.pdfBayesian_Decision_Theory-3.pdf
Bayesian_Decision_Theory-3.pdf
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity data
 

Plus de tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 

Plus de tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Dernier

projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and momentdonamiaquintan2
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptAmirRaziq1
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfAtiaGohar1
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxJosielynTars
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 

Dernier (20)

projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and moment
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.ppt
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptx
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 

Introduction to Statistical Learning

  • 1. A short introduction to statistical learning Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org Axe “Apprentissage et Processus” October 15th, 2014 - Unité MIA-T, INRA, Toulouse Nathalie Villa-Vialaneix | Introduction to statistical learning 1/25
  • 2. Outline 1 Introduction Background and notations Underfitting / Overfitting Consistency 2 SVM Nathalie Villa-Vialaneix | Introduction to statistical learning 2/25
  • 3. Outline 1 Introduction Background and notations Underfitting / Overfitting Consistency 2 SVM Nathalie Villa-Vialaneix | Introduction to statistical learning 3/25
  • 4. Background Purpose: predict Y from X; Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
  • 5. Background Purpose: predict Y from X; What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
  • 6. Background Purpose: predict Y from X; What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); What we want: estimate unknown Y from new X: xn+1, . . . , xm. Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
  • 7. Background Purpose: predict Y from X; What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); What we want: estimate unknown Y from new X: xn+1, . . . , xm. X can be: numeric variables; or factors; or a combination of numeric variables and factors. Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
  • 8. Background Purpose: predict Y from X; What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); What we want: estimate unknown Y from new X: xn+1, . . . , xm. X can be: numeric variables; or factors; or a combination of numeric variables and factors. Y can be: a numeric variable (Y 2 R) ) (supervised) regression régression; a factor ) (supervised) classification discrimination. Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
  • 9. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
  • 10. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): if Y is numeric, n is called a regression function fonction de classification; if Y is a factor, n is called a classifier classifieur; Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
  • 11. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): if Y is numeric, n is called a regression function fonction de classification; if Y is a factor, n is called a classifier classifieur; n is said to be trained or learned from the observations (xi ; yi)i . Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
  • 12. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): if Y is numeric, n is called a regression function fonction de classification; if Y is a factor, n is called a classifier classifieur; n is said to be trained or learned from the observations (xi ; yi)i . Desirable properties accuracy to the observations: predictions made on known data are close to observed values; Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
  • 13. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): if Y is numeric, n is called a regression function fonction de classification; if Y is a factor, n is called a classifier classifieur; n is said to be trained or learned from the observations (xi ; yi)i . Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
  • 14. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): if Y is numeric, n is called a regression function fonction de classification; if Y is a factor, n is called a classifier classifieur; n is said to be trained or learned from the observations (xi ; yi)i . Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Conflicting objectives!! Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
  • 15. Underfitting/Overfitting sous/sur - apprentissage Function x ! y to be estimated Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
  • 16. Underfitting/Overfitting sous/sur - apprentissage Observations we might have Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
  • 17. Underfitting/Overfitting sous/sur - apprentissage Observations we do have Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
  • 18. Underfitting/Overfitting sous/sur - apprentissage First estimation from the observations: underfitting Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
  • 19. Underfitting/Overfitting sous/sur - apprentissage Second estimation from the observations: accurate estimation Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
  • 20. Underfitting/Overfitting sous/sur - apprentissage Third estimation from the observations: overfitting Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
  • 21. Underfitting/Overfitting sous/sur - apprentissage Summary Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
  • 22. Errors training error (measures the accuracy to the observations) Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
  • 23. Errors training error (measures the accuracy to the observations) I if y is a factor: misclassification rate ]f^yi , yi ; i = 1; : : : ; ng n Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
  • 24. Errors training error (measures the accuracy to the observations) I if y is a factor: misclassification rate ]f^yi , yi ; i = 1; : : : ; ng n I if y is numeric: mean square error (MSE) 1 n Xn i=1 (^yi yi)2 Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
  • 25. Errors training error (measures the accuracy to the observations) I if y is a factor: misclassification rate ]f^yi , yi ; i = 1; : : : ; ng n I if y is numeric: mean square error (MSE) 1 n Xn i=1 (^yi yi)2 or root mean square error (RMSE) or pseudo-R2: 1MSE=Var((yi)i) Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
  • 26. Errors training error (measures the accuracy to the observations) I if y is a factor: misclassification rate ]f^yi , yi ; i = 1; : : : ; ng n I if y is numeric: mean square error (MSE) 1 n Xn i=1 (^yi yi)2 or root mean square error (RMSE) or pseudo-R2: 1MSE=Var((yi)i) test error: a way to prevent overfitting (estimates the generalization error) is the simple validation Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
  • 27. Errors training error (measures the accuracy to the observations) I if y is a factor: misclassification rate ]f^yi , yi ; i = 1; : : : ; ng n I if y is numeric: mean square error (MSE) 1 n Xn i=1 (^yi yi)2 or root mean square error (RMSE) or pseudo-R2: 1MSE=Var((yi)i) test error: a way to prevent overfitting (estimates the generalization error) is the simple validation 1 split the data into training/test sets (usually 80%/20%) 2 train n from the training dataset 3 calculate the test error from the remaining data Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
  • 28. Example Observations Nathalie Villa-Vialaneix | Introduction to statistical learning 8/25
  • 29. Example Training/Test datasets Nathalie Villa-Vialaneix | Introduction to statistical learning 8/25
  • 30. Example Training/Test errors Nathalie Villa-Vialaneix | Introduction to statistical learning 8/25
  • 31. Example Summary Nathalie Villa-Vialaneix | Introduction to statistical learning 8/25
  • 32. Consistency in the parametric/non parametric case Example in the parametric framework (linear methods) an assumption is made on the form of the relation between X and Y: Y =
  • 33. TX +
  • 34. is estimated from the observations (x1; y1), . . . , (xn; yn) by a given method which calculates a
  • 35. n. The estimation is said to be consistent if
  • 37. under (eventually) technical assumptions on X, , Y. Nathalie Villa-Vialaneix | Introduction to statistical learning 9/25
  • 38. Consistency in the parametric/non parametric case Example in the nonparametric framework the form of the relation between X and Y is unknown: Y = (X) + is estimated from the observations (x1; y1), . . . , (xn; yn) by a given method which calculates a n. The estimation is said to be consistent if n n!+1 ! under (eventually) technical assumptions on X, , Y. Nathalie Villa-Vialaneix | Introduction to statistical learning 9/25
  • 39. Consistency from the statistical learning perspective [Vapnik, 1995] Question: Are we really interested in estimating or... Nathalie Villa-Vialaneix | Introduction to statistical learning 10/25
  • 40. Consistency from the statistical learning perspective [Vapnik, 1995] Question: Are we really interested in estimating or... ... rather in having the smallest prediction error? Statistical learning perspective: a method that builds a machine n from the observations is said to be (universally) consistent if, given a risk function R : R R ! R+ (which calculates an error), E (R(n(X); Y)) n!+1 ! inf :X!R E (R((X); Y)) ; for any distribution of (X; Y) 2 X R. Definitions: L = inf:X!R E (R((X); Y)) and L = E (R((X); Y)). Nathalie Villa-Vialaneix | Introduction to statistical learning 10/25
  • 41. Desirable properties from a mathematical perspective Simplified framework: X 2 X and Y 2 f1; 1g (binary classification) Learning process: choose a machine n in a class of functions C f : X ! Rg (e.g., C is the set of all functions that can be build using a SVM). Error decomposition Ln L Ln inf 2C L + inf 2C L L with inf2C L L is the richness of C (i.e., C must be rich to ensure that this term is small); Nathalie Villa-Vialaneix | Introduction to statistical learning 11/25
  • 42. Desirable properties from a mathematical perspective Simplified framework: X 2 X and Y 2 f1; 1g (binary classification) Learning process: choose a machine n in a class of functions C f : X ! Rg (e.g., C is the set of all functions that can be build using a SVM). Error decomposition Ln L Ln inf 2C L + inf 2C L L with inf2C L L is the richness of C (i.e., C must be rich to ensure that this term is small); Ln inf2C L 2 sup2C jLn Lj, Ln = 1 n Pni =1 R((xi); yi) is the generalization capability of C (i.e., in the worst case, the empirical error must be close to the true error: C must not be too rich to ensure that this term is small). Nathalie Villa-Vialaneix | Introduction to statistical learning 11/25
  • 43. Outline 1 Introduction Background and notations Underfitting / Overfitting Consistency 2 SVM Nathalie Villa-Vialaneix | Introduction to statistical learning 12/25
  • 44. Basic introduction Binary classification problem: X 2 H et Y 2 f1; 1g A training set is given: (x1; y1); : : : ; (xn; yn) Nathalie Villa-Vialaneix | Introduction to statistical learning 13/25
  • 45. Basic introduction Binary classification problem: X 2 H et Y 2 f1; 1g A training set is given: (x1; y1); : : : ; (xn; yn) SVM is a method based on kernels. It is universally consistent method, given that the kernel is universal [Steinwart, 2002]. Extensions to the regression case exist (SVR or LS-SVM) that are also universally consistent when the kernel is universal. Nathalie Villa-Vialaneix | Introduction to statistical learning 13/25
  • 46. Optimal margin classification Nathalie Villa-Vialaneix | Introduction to statistical learning 14/25
  • 47. Optimal margin classification Nathalie Villa-Vialaneix | Introduction to statistical learning 14/25
  • 48. Optimal margin classification w margin: 1 kwk2 Support Vector Nathalie Villa-Vialaneix | Introduction to statistical learning 14/25
  • 49. Optimal margin classification w margin: 1 kwk2 Support Vector w is chosen such that: minw kwk2 (the margin is the largest), under the constraints: yi(hw; xii + b) 1; 1 i n (the separation between the two classes is perfect). ) ensures a good generalization capability. Nathalie Villa-Vialaneix | Introduction to statistical learning 14/25
  • 50. Soft margin classification Nathalie Villa-Vialaneix | Introduction to statistical learning 15/25
  • 51. Soft margin classification Nathalie Villa-Vialaneix | Introduction to statistical learning 15/25
  • 52. Soft margin classification w margin: 1 kwk2 Support Vector Nathalie Villa-Vialaneix | Introduction to statistical learning 15/25
  • 53. Soft margin classification w margin: 1 kwk2 Support Vector w is chosen such that: minw; kwk2 + C Pni =1 i (the margin is the largest), under the constraints: yi(hw; xii + b) 1 i ; 1 i n; i 0; 1 i n: (the separation between the two classes is almost perfect). ) allowing a few errors improves the richness of the class. Nathalie Villa-Vialaneix | Introduction to statistical learning 15/25
  • 54. Non linear SVM Original space X Nathalie Villa-Vialaneix | Introduction to statistical learning 16/25
  • 55. Non linear SVM Original space X Feature space H (non linear) Nathalie Villa-Vialaneix | Introduction to statistical learning 16/25
  • 56. Non linear SVM Original space X Feature space H (non linear) Nathalie Villa-Vialaneix | Introduction to statistical learning 16/25
  • 57. Non linear SVM Original space X Feature space H (non linear) w 2 H is chosen such that (PC;H): minw; kwk2 H + C Pni =1 i (the margin in the feature space is the largest), under the constraints: yi(hw; (xi)iH + b) 1 i ; 1 i n; i 0; 1 i n: (the separation between the two classes in the feature space is almost perfect). Nathalie Villa-Vialaneix | Introduction to statistical learning 16/25
  • 58. SVM from different points of view A regularization problem: (PC;H) , (P2 ;H) : min w2H 1 n Xn i=1 R(fw(xi); yi) | {z } error term + kwk2 |{zH} ; penalization term where fw(x) = h (x);wiH and R(^y; y) = max(0; 1 ^yy) (hinge loss function) errors versus ^y for y = 1: I blue: hinge loss; I green: misclassification error. Nathalie Villa-Vialaneix | Introduction to statistical learning 17/25
  • 59. SVM from different points of view A regularization problem: (PC;H) , (P2 ;H) : min w2H 1 n Xn i=1 R(fw(xi); yi) | {z } error term + kwk2 |{zH} ; penalization term where fw(x) = h (x);wiH and R(^y; y) = max(0; 1 ^yy) (hinge loss function) A dual problem: (PC;H) , (DC;X) : max2Rn Pni =1 i Pni =1 Pnj =1 ijyiyjh (xi); (xj)iH; with PNi =1 iyi = 0; 0 i C; 1 i n: Nathalie Villa-Vialaneix | Introduction to statistical learning 17/25
  • 60. SVM from different points of view A regularization problem: (PC;H) , (P2 ;H) : min w2H 1 n Xn i=1 R(fw(xi); yi) | {z } error term + kwk2 |{zH} ; penalization term where fw(x) = h (x);wiH and R(^y; y) = max(0; 1 ^yy) (hinge loss function) A dual problem: (PC;H) , (DC;X) : max2Rn Pni =1 i Pni =1 Pnj =1 ijyiyjK(xi ; xj); with PNi =1 iyi = 0; 0 i C; 1 i n: There is no need to know and H: I choose a function K with a few good properties; I use it as the dot product in H: 8 u; v 2 H; K(u; v) = h (u); (v)iH. Nathalie Villa-Vialaneix | Introduction to statistical learning 17/25
  • 61. Which kernels? Minimum properties that a kernel should fulfilled symmetry: K(u; u0) = K(u0; u) positivity: 8N 2 N, 8 (i) RN, 8 (xi) XN, P i;j ijK(xi ; xj) 0. [Aronszajn, 1950]: 9 a Hilbert space (H; h:; :iH) and a function : X ! H such that: 8 u; v 2 H; K(u; v) = h (u); (v)iH Nathalie Villa-Vialaneix | Introduction to statistical learning 18/25
  • 62. Which kernels? Minimum properties that a kernel should fulfilled symmetry: K(u; u0) = K(u0; u) positivity: 8N 2 N, 8 (i) RN, 8 (xi) XN, P i;j ijK(xi ; xj) 0. [Aronszajn, 1950]: 9 a Hilbert space (H; h:; :iH) and a function : X ! H such that: 8 u; v 2 H; K(u; v) = h (u); (v)iH Examples the Gaussian kernel: 8 x; x0 2 Rd, K(x; x0) = e kxx0k2 (it is universal for all bounded subset of Rd); the linear kernel: 8 x; x0 2 Rd, K(x; x0) = xT (x0) (it is not universal). Nathalie Villa-Vialaneix | Introduction to statistical learning 18/25
  • 63. In summary, how does the solution write???? n(x) = X i iyiK(xi ; x) where only a few i , 0. i such that i , 0 are the support vectors! Nathalie Villa-Vialaneix | Introduction to statistical learning 19/25
  • 64. I’m almost dead with all these stuffs on my mind!!! What in practice? data(iris) iris - iris[iris$Species%in%c(versicolor,virginica),] plot(iris$Petal.Length , iris$Petal.Width , col=iris$Species , pch=19) legend(topleft, pch=19, col=c(2,3), legend=c(versicolor, virginica)) Nathalie Villa-Vialaneix | Introduction to statistical learning 20/25
  • 65. I’m almost dead with all these stuffs on my mind!!! What in practice? library(e1071) res.tune - tune.svm(Species ~ ., data=iris , kernel=linear, cost = 2^(-1:4)) # Parameter tuning of 'svm': # - sampling method: 10fold cross validation # - best parameters: # cost # 0.5 # - best performance: 0.05 res.tune$best.model # Call: # best.svm(x = Species ~ ., data = iris, cost = 2^(-1:4), # kernel = linear) # Parameters: # SVM-Type: C-classification # SVM-Kernel: linear # cost: 0.5 # gamma: 0.25 # Number of Support Vectors: 21 Nathalie Villa-Vialaneix | Introduction to statistical learning 21/25
  • 66. I’m almost dead with all these stuffs on my mind!!! What in practice? table(res.tune$best.model$fitted , iris$Species) % setosa versicolor virginica % setosa 0 0 0 % versicolor 0 45 0 % virginica 0 5 50 plot(res.tune$best.model , data=iris , Petal.Width~Petal.Length , slice = list(Sepal.Width = 2.872, Sepal.Length = 6.262)) Nathalie Villa-Vialaneix | Introduction to statistical learning 22/25
  • 67. I’m almost dead with all these stuffs on my mind!!! What in practice? res.tune - tune.svm(Species ~ ., data=iris , gamma = 2^(-1:1), cost = 2^(2:4)) # Parameter tuning of 'svm': # - sampling method: 10fold cross validation # - best parameters: # gamma cost # 0.5 4 # - best performance: 0.08 res.tune$best.model # Call: # best.svm(x = Species ~ ., data = iris, gamma = 2^(-1:1), # cost = 2^(2:4)) # Parameters: # SVM-Type: C-classification # SVM-Kernel: radial # cost: 4 # gamma: 0.5 # Number of Support Vectors: 32 Nathalie Villa-Vialaneix | Introduction to statistical learning 23/25
  • 68. I’m almost dead with all these stuffs on my mind!!! What in practice? table(res.tune$best.model$fitted , iris$Species) % setosa versicolor virginica % setosa 0 0 0 % versicolor 0 49 0 % virginica 0 1 50 plot(res.tune$best.model , data=iris , Petal.Width~Petal.Length , slice = list(Sepal.Width = 2.872, Sepal.Length = 6.262)) Nathalie Villa-Vialaneix | Introduction to statistical learning 24/25
  • 69. References Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3):337–404. Steinwart, I. (2002). Support vector machines are universally consistent. Journal of Complexity, 18:768–791. Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer Verlag, New York, USA. and more can be found on my website: http://nathalievilla.org/learning.html Nathalie Villa-Vialaneix | Introduction to statistical learning 25/25