SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
LINEAR MODEL
2016/06/08
Yagi Takayuki
REFERENCE
Pattern Recognition and Machine Learning (PRML)
chapter3,4
TABLE OF CONTENTS
1. linear regression
1.1. what is regression
1.2. linear regression
1.3. ridge regression
1.4. lasso regression
1.5. generalization
1.6. maximum likelihood estimation
1.7. MAP estimation
2. linear classification
2.1. multi-class classification
2.2. disadvandages of least squares method
we want to know a line that most fitting
WHAT IS REGRESSION
there are some of data
we want to know such a line in any situation.
-> regression analysis
WHAT IS REGRESSION
is best fittingy = 0.6147 + 1.0562x1
NOTATION
: scalarx, w, t
: vectorx, w, t
: matrixX, W, T
LINEAR MODEL
the simplest model is
y(x, w) = + + ⋯ +w0 w1 x1 wD xD
: weight parameteswi : variablesxi : the number of variablesD
FEATURE
linear with respect to
linear with respect to
model is too simple (poor expressive power)
w
x
EXTEND THE MODEL
add linear combination of the non-linear function
y(x, w) = + (x)w0 ∑M−1
j=1
wj ϕj
is number of basis functionsM − 1
called basis function(x)ϕj
called bias parameterw0
LINEAR MODEL
if we add dummy basis function( )(x) = 1ϕ0
y(x, w) = (x) = ϕ(x)∑M−1
j=0
wj ϕj w
T
w = ( , …,w0 wM−1 )
T
ϕ(x) = ( (x), …, (x)ϕ0 ϕM−1 )
T
BASIS FUNCTION
there are various choices for the basis function
polynomial basis
gaussian basis
logistic sigmoid basis
POLYNOMIAL BASIS
(x) =ϕj x
j
GAUSSIAN BASIS
(x) = exp
(
− )
ϕj
(x−μj
)
2
2s
2
LOGISTIC SIGMOID BASIS
(x) = σ( )
ϕj
x−μj
s
σ(a) =
1
1+exp (−a)
LINEAR MODEL
y(x, w) = ϕ(x)w
T
FEATURE
linear with respect to
non-linear with respect to
we can choose a favorite basis
w
x
LINEAR REGRESSION
we want to find the best
y(x, w) = ϕ(x)w
T
w
is best fittingy = 0.6147 + 1.0562x1
HOW TO REGRESSION
reducing the error
THE MAIN IDEA OF REGRESSION
Minimization of the error function
: number of data
: i-th data
: target value of i-th data
E(w) = ( ϕ( ) −1
2
∑N
i=1
w
T
x
(i)
t
(i)
)
2
N
x
(i)
t
(i)
※ called least squares method
※
E(w)minw
called sum-of-squares errorE(w)
LINEAR REGRESSION
we want to minimize the error function
E(w)minw
E(w) = ( ϕ( ) − = ||Φw − t|
1
2
∑N
i=1
w
T
x
(i)
t
(i)
)
2 1
2
|
2
Φ = (ϕ( ), ϕ( ), …, ϕ( )x
(1)
x
(2)
x
(N)
)
T
t = ( , , …,t
(1)
t
(2)
t
(N)
)
T
LINEAR REGRESSION
E(w) = ||Φw − t|
1
2
|
2
partial differential in w
= (Φw − t)
∂E(w)
∂w
Φ
T
put with = 0
E(∂w)
∂w
Φw = tΦ
T
Φ
T
∴ w = ( Φ tΦ
T
)
−1
Φ
T
implementation
REDGE REGRESSIN
E(w)minw
E(w) = ||Φw − t| + ||w|
1
2
|
2 λ
2
|
2
※ called L2 regularization term||w||
2
REDGE REGRESSION
E(w) = ||Φw − t| + ||w|
1
2
|
2 λ
2
|
2
= (Φw − t) + λw = 0
E(∂w)
∂w
Φ
T
( Φ − λI)w = tΦ
T
Φ
T
∴ w = ( Φ − λI tΦ
T
)
−1
Φ
T
implementation
LASSO REGRESSION
E(w) = ||Φw − t| + |w|
1
2
|
2 λ
2
∑M−1
j=1
※ called L1 regularization term|w|
LASSO REGRESSION
not be solved analytically (nondifferentiable)
solved by coordinate descent
perform variable selection(some of the parameters to 0)
implementation
GENERALIZATION
general of the redge and lasso
E(w) = ||Φt − t| + |w
1
2
|
2 λ
2
∑M−1
j=1
|
q
RE-EXPRESSION
is equal to
s.t.
(
||Φw − t| + |w
)
minw
1
2
|
2 λ
2
∑M
j=1
|
q
||Φw − t|minw
1
2
|
2
| ≤ η∑M
j=1
wj |
q
※ is calculated from lagrange multiplier methodη
IMAGE
le : redge regression
right : lasso regression
PROOF
s.t.( ϕ( ) −minw
1
2
∑N
i=1
w
T
x
(i)
t
(i)
)
2
| ≤ η∑M
j=1
wj |
q
by using the Lagrange multiplier method
L(w, λ) = ( ϕ( ) − + ( | − η)
1
2
∑N
i=1
w
T
x
(i)
t
(i)
)
2 λ
2
∑M
j=1
wj |
q
by using KKT conditions
= | − η∂L(w,λ)
∂λ ∑M
j=1
wj |
q
∴ | = η∑M
j=1
w
∗
j
|
q
MAXIMIZE LIKELIHOOD AND SUM-OF-SQUARES ERROR
we think is represented by sum of and Gaussian noiset y(x, w)
t = y(x, w) + ε
MAXIMIZE LIKELIHOOD AND SUM-OF-SQUARES ERROR
p(t|x, w, β) = N(t|y(x, w), )β−1
N(x|μ, ) = exp
(
− (x − μ )
σ2 1
(2πσ2
)
1/2
1
2σ2
)
2
LIKELIHOOD
p(t|x, w, β) = N( | ϕ( ), )∏N
n=1
tn w
T
xn β−1
※ t = ( , , …,t1 t2 tN )
T
MAXIMIZE LIKELIHOOD
ln p(t|x, w, β) = ln N( | ϕ( ), )∑N
n=1
tn w
T
xn β−1
ln p(t|x, w, β) = − ( − ϕ( ) + ln β − ln(2
β
2
∑N
n=1
tn w
T
xn )
2 N
2
N
2
therefore, maximize likelifood is equal to minimize sum-of-squares error
FORMULA DEFORMATION
p(t|X, w, β) = N( | ϕ( ), )∏N
n=1
tn w
T
xn β−1
p(t|X, w, β) =
(
exp
(
− ( ϕ( ) − )
∏N
n=1
β
2π )
1/2
β
2
w
T
x
(i)
t
(i)
)
2
ln p(t|X, w, β) = − ( − ϕ( ) + ln β − ln(
β
2
∑N
n=1
tn w
T
xn )
2 N
2
N
2
MAP ESTIMATION AND L2 REGULARIZATION
we add the prior distribution.
by using the Bayes theorem
p(w|x, t, α, β) ∝ p(t|x, w, β)p(w|α)
PRIOR DISTRIBUTION
Given this prior distribution
because calculation is easy.
p(w|α) = N(w|0, I) =
(
exp(− w)α−1 α
2π )
(M+1)/2
α
2
w
T
MAP ESTIMATION
is equal to
p(w|x, t, α, β)maxw
( ||Φw − t| + ||w| )minw
1
2
|
2 λ
2
|
2
p(w|x, t, α, β) ∝ p(t|x, w, β)p(w|α)
p(t|x, w, β) = N( | ϕ( ), )∏N
n=1
tn w
T
xn β−1
p(w|α) = N(w|0, I) =
(
exp(− w)α−1 α
2π )
(M+1)/2
α
2
w
T
so, MAP estimation is equal to redge regression
FORMULA DEFORMATION
p(t|x, w, α, β) =
((
exp
(
− ( ϕ( ) − ))(
exp(− w)∏N
n=1
β
2π )
1/2
β
2
w
T
x
(i)
t
(i)
)
2 α
2π )
(M+1)/2
α
2
w
T
ln p(t|x, w, β) = − ( − ϕ( ) + ln β − ln(2π) + ln α − ln(2π) −
β
2
∑N
n=1
tn w
T
xn )
2 N
2
N
2
M+1
2
M+1
2
α
2
w
T
SUMMARY OF LINEAR REGRESSION
I introduced some of the linear regression model
I showed maximize likelifood is equal to minimize sum-of-
squares error
I showed MAP estimation is equal to redge regression
LINEAR CLASSIFICATION
we think K-class (K>2) classification
so, we prepare K linear models
(x) = ϕ(x)yk w
T
k
y(x) = xW˜
T
= ( , , …, )W˜ w1 w2 wK
y(x) = ( (x), (x), …, (x)y1 y2 yK )
T
1-OF-K CODING
we prepare vector p
(if is class )
(if is not class )
p = ( , , …,c1 c2 cK )
T
= 1ci x i
= 0ci x i
ex) p = (0, 0, …, 0, 1, 0, …, 0)
T
E( )W˜
∂E( W)˜
∂W˜
= ||y( ) − |
1
2 ∑
i=1
N
x
(i)
p
(i)
|
2
= || ϕ( ) − |
1
2 ∑
i=1
N
W˜
T
x
(i)
p
(i)
|
2
= ( ϕ( ) − ( ϕ( ) − )
1
2 ∑
i=1
N
W˜
T
x
(i)
p
(i)
)
T
W˜
T
x
(i)
p
(i)
=
(
ϕ( ϕ( ) − 2ϕ( + || ||
)
1
2 ∑
i=1
N
x
(i)
)
T
W˜W˜ x
(i)
x
(i)
)
T
W˜p
(i)
p
(i)
=
(
ϕ( )ϕ( − ϕ( )
)∑
i=1
N
x
(i)
x
(i)
)
T
W˜ x
(i)
p
(i)
T
LEAST-SQUARES METHOD
LEAST-SQUARES METHOD
by
= X − P
∂E( )W˜
∂W˜
X
T
W˜ X
T
= 0
∂E( )W˜
∂W˜
X = PX
T
W˜ X
T
∴ = ( X PW˜ X
T
)
−1
X
T
X
P
= (ϕ( ), ϕ( ), …, ϕ( )x
(1)
x
(2)
x
(N)
)
T
= ( , , …, )p
(i)
p
(2)
p
(N)
WEAK TO OUTLIERS
red : least square method
green : logistic regression
ASSUMING A NORMAL DISTRIBUTION
le : least square method
right : logistic regression
DISADVANTAGES OF LEAST-SQUARES
METHOD
not handle the label as the probability
weak to outliers
assuming a normal distribution
if data does not follow the normal distribution, bad
result
we shoud not use least-squares method in classification
problem
SUMMARY
I introduced linear model (regression, classification)
there are the basis for some of other machine learning
model
PRML is difficult for me, but I want to continue reading
thank you

Contenu connexe

Tendances

Integration
IntegrationIntegration
Integrationsuefee
 
Local linear approximation
Local linear approximationLocal linear approximation
Local linear approximationTarun Gehlot
 
Common derivatives integrals
Common derivatives integralsCommon derivatives integrals
Common derivatives integralsolziich
 
numericai matmatic matlab uygulamalar ali abdullah
numericai matmatic  matlab  uygulamalar ali abdullahnumericai matmatic  matlab  uygulamalar ali abdullah
numericai matmatic matlab uygulamalar ali abdullahAli Abdullah
 
Interpolation and Extrapolation
Interpolation and ExtrapolationInterpolation and Extrapolation
Interpolation and ExtrapolationVNRacademy
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learningAlexander Novikov
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theoremJamesMa54
 
Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...JamesMa54
 
Integral Calculus
Integral CalculusIntegral Calculus
Integral Calculusitutor
 
Lecture 11 systems of nonlinear equations
Lecture 11 systems of nonlinear equationsLecture 11 systems of nonlinear equations
Lecture 11 systems of nonlinear equationsHazel Joy Chong
 
NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)krishnapriya R
 
Inner Product Space
Inner Product SpaceInner Product Space
Inner Product SpacePatel Raj
 

Tendances (20)

11365.integral 2
11365.integral 211365.integral 2
11365.integral 2
 
exponen dan logaritma
exponen dan logaritmaexponen dan logaritma
exponen dan logaritma
 
Integration
IntegrationIntegration
Integration
 
Local linear approximation
Local linear approximationLocal linear approximation
Local linear approximation
 
Common derivatives integrals
Common derivatives integralsCommon derivatives integrals
Common derivatives integrals
 
Reduction forumla
Reduction forumlaReduction forumla
Reduction forumla
 
Interpolation
InterpolationInterpolation
Interpolation
 
numericai matmatic matlab uygulamalar ali abdullah
numericai matmatic  matlab  uygulamalar ali abdullahnumericai matmatic  matlab  uygulamalar ali abdullah
numericai matmatic matlab uygulamalar ali abdullah
 
Interpolation and Extrapolation
Interpolation and ExtrapolationInterpolation and Extrapolation
Interpolation and Extrapolation
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learning
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theorem
 
Jacobi method
Jacobi methodJacobi method
Jacobi method
 
HERMITE SERIES
HERMITE SERIESHERMITE SERIES
HERMITE SERIES
 
Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...
 
NUMERICAL METHODS
NUMERICAL METHODSNUMERICAL METHODS
NUMERICAL METHODS
 
Integral Calculus
Integral CalculusIntegral Calculus
Integral Calculus
 
Lecture 11 systems of nonlinear equations
Lecture 11 systems of nonlinear equationsLecture 11 systems of nonlinear equations
Lecture 11 systems of nonlinear equations
 
Numerical Methods Solving Linear Equations
Numerical Methods Solving Linear EquationsNumerical Methods Solving Linear Equations
Numerical Methods Solving Linear Equations
 
NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)
 
Inner Product Space
Inner Product SpaceInner Product Space
Inner Product Space
 

En vedette

Data Visualization at codetalks 2016
Data Visualization at codetalks 2016Data Visualization at codetalks 2016
Data Visualization at codetalks 2016Stefan Kühn
 
線形識別モデル
線形識別モデル線形識別モデル
線形識別モデル貴之 八木
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNEDavid Khosid
 
混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム貴之 八木
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
 
トピックモデル
トピックモデルトピックモデル
トピックモデル貴之 八木
 
自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試みtm_2648
 
11 ak45b5 5
11 ak45b5 511 ak45b5 5
11 ak45b5 5crom68
 
自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭Yuya Unno
 
word2vec - From theory to practice
word2vec - From theory to practiceword2vec - From theory to practice
word2vec - From theory to practicehen_drik
 
Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所Ogushi Masaya
 
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)STAIR Lab, Chiba Institute of Technology
 
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料tmprcd12345
 
fastTextの実装を見てみた
fastTextの実装を見てみたfastTextの実装を見てみた
fastTextの実装を見てみたYoshihiko Shiraki
 
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心にtmprcd12345
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNETomoki Hayashi
 
Chainerの使い方と 自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と 自然言語処理への応用Yuya Unno
 

En vedette (20)

Data Visualization at codetalks 2016
Data Visualization at codetalks 2016Data Visualization at codetalks 2016
Data Visualization at codetalks 2016
 
最適腕識別
最適腕識別最適腕識別
最適腕識別
 
線形識別モデル
線形識別モデル線形識別モデル
線形識別モデル
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
主成分分析
主成分分析主成分分析
主成分分析
 
トピックモデル
トピックモデルトピックモデル
トピックモデル
 
自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み
 
t-SNE
t-SNEt-SNE
t-SNE
 
11 ak45b5 5
11 ak45b5 511 ak45b5 5
11 ak45b5 5
 
自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭
 
word2vec - From theory to practice
word2vec - From theory to practiceword2vec - From theory to practice
word2vec - From theory to practice
 
Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所
 
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
 
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
 
fastTextの実装を見てみた
fastTextの実装を見てみたfastTextの実装を見てみた
fastTextの実装を見てみた
 
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
Chainerの使い方と 自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と 自然言語処理への応用
 

Similaire à 線形回帰モデル

Unit 1 Operation on signals
Unit 1  Operation on signalsUnit 1  Operation on signals
Unit 1 Operation on signalsDr.SHANTHI K.G
 
University of manchester mathematical formula tables
University of manchester mathematical formula tablesUniversity of manchester mathematical formula tables
University of manchester mathematical formula tablesGaurav Vasani
 
Hermite integrators and 2-parameter subgroup of Riordan group
Hermite integrators and 2-parameter subgroup of Riordan groupHermite integrators and 2-parameter subgroup of Riordan group
Hermite integrators and 2-parameter subgroup of Riordan groupKeigo Nitadori
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface ProblemsAn Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface ProblemsAlex (Oleksiy) Varfolomiyev
 
Admissions in India 2015
Admissions in India 2015Admissions in India 2015
Admissions in India 2015Edhole.com
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Alexander Litvinenko
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Gabriel Peyré
 
Linear Regression
Linear RegressionLinear Regression
Linear RegressionVARUN KUMAR
 
Mathematical formula tables
Mathematical formula tablesMathematical formula tables
Mathematical formula tablesSaravana Selvan
 
Discrete Signal Processing
Discrete Signal ProcessingDiscrete Signal Processing
Discrete Signal Processingmargretrosy
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES
 

Similaire à 線形回帰モデル (20)

5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 
Unit 1 Operation on signals
Unit 1  Operation on signalsUnit 1  Operation on signals
Unit 1 Operation on signals
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
University of manchester mathematical formula tables
University of manchester mathematical formula tablesUniversity of manchester mathematical formula tables
University of manchester mathematical formula tables
 
Hermite integrators and 2-parameter subgroup of Riordan group
Hermite integrators and 2-parameter subgroup of Riordan groupHermite integrators and 2-parameter subgroup of Riordan group
Hermite integrators and 2-parameter subgroup of Riordan group
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
Adaline and Madaline.ppt
Adaline and Madaline.pptAdaline and Madaline.ppt
Adaline and Madaline.ppt
 
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface ProblemsAn Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
 
NODDEA2012_VANKOVA
NODDEA2012_VANKOVANODDEA2012_VANKOVA
NODDEA2012_VANKOVA
 
Admissions in India 2015
Admissions in India 2015Admissions in India 2015
Admissions in India 2015
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics
 
Section4 stochastic
Section4 stochasticSection4 stochastic
Section4 stochastic
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Quadratic Function.pptx
Quadratic Function.pptxQuadratic Function.pptx
Quadratic Function.pptx
 
Mathematical formula tables
Mathematical formula tablesMathematical formula tables
Mathematical formula tables
 
Discrete Signal Processing
Discrete Signal ProcessingDiscrete Signal Processing
Discrete Signal Processing
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 

Dernier

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Dernier (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

線形回帰モデル

  • 2. REFERENCE Pattern Recognition and Machine Learning (PRML) chapter3,4
  • 3. TABLE OF CONTENTS 1. linear regression 1.1. what is regression 1.2. linear regression 1.3. ridge regression 1.4. lasso regression 1.5. generalization 1.6. maximum likelihood estimation 1.7. MAP estimation 2. linear classification 2.1. multi-class classification 2.2. disadvandages of least squares method
  • 4. we want to know a line that most fitting WHAT IS REGRESSION there are some of data
  • 5. we want to know such a line in any situation. -> regression analysis WHAT IS REGRESSION is best fittingy = 0.6147 + 1.0562x1
  • 6. NOTATION : scalarx, w, t : vectorx, w, t : matrixX, W, T
  • 7. LINEAR MODEL the simplest model is y(x, w) = + + ⋯ +w0 w1 x1 wD xD : weight parameteswi : variablesxi : the number of variablesD
  • 8. FEATURE linear with respect to linear with respect to model is too simple (poor expressive power) w x
  • 9. EXTEND THE MODEL add linear combination of the non-linear function y(x, w) = + (x)w0 ∑M−1 j=1 wj ϕj is number of basis functionsM − 1 called basis function(x)ϕj called bias parameterw0
  • 10. LINEAR MODEL if we add dummy basis function( )(x) = 1ϕ0 y(x, w) = (x) = ϕ(x)∑M−1 j=0 wj ϕj w T w = ( , …,w0 wM−1 ) T ϕ(x) = ( (x), …, (x)ϕ0 ϕM−1 ) T
  • 11. BASIS FUNCTION there are various choices for the basis function polynomial basis gaussian basis logistic sigmoid basis
  • 13. GAUSSIAN BASIS (x) = exp ( − ) ϕj (x−μj ) 2 2s 2
  • 14. LOGISTIC SIGMOID BASIS (x) = σ( ) ϕj x−μj s σ(a) = 1 1+exp (−a)
  • 15. LINEAR MODEL y(x, w) = ϕ(x)w T
  • 16. FEATURE linear with respect to non-linear with respect to we can choose a favorite basis w x
  • 17. LINEAR REGRESSION we want to find the best y(x, w) = ϕ(x)w T w is best fittingy = 0.6147 + 1.0562x1
  • 19. THE MAIN IDEA OF REGRESSION Minimization of the error function : number of data : i-th data : target value of i-th data E(w) = ( ϕ( ) −1 2 ∑N i=1 w T x (i) t (i) ) 2 N x (i) t (i) ※ called least squares method ※ E(w)minw called sum-of-squares errorE(w)
  • 20. LINEAR REGRESSION we want to minimize the error function E(w)minw E(w) = ( ϕ( ) − = ||Φw − t| 1 2 ∑N i=1 w T x (i) t (i) ) 2 1 2 | 2 Φ = (ϕ( ), ϕ( ), …, ϕ( )x (1) x (2) x (N) ) T t = ( , , …,t (1) t (2) t (N) ) T
  • 21. LINEAR REGRESSION E(w) = ||Φw − t| 1 2 | 2 partial differential in w = (Φw − t) ∂E(w) ∂w Φ T put with = 0 E(∂w) ∂w Φw = tΦ T Φ T ∴ w = ( Φ tΦ T ) −1 Φ T implementation
  • 22. REDGE REGRESSIN E(w)minw E(w) = ||Φw − t| + ||w| 1 2 | 2 λ 2 | 2 ※ called L2 regularization term||w|| 2
  • 23. REDGE REGRESSION E(w) = ||Φw − t| + ||w| 1 2 | 2 λ 2 | 2 = (Φw − t) + λw = 0 E(∂w) ∂w Φ T ( Φ − λI)w = tΦ T Φ T ∴ w = ( Φ − λI tΦ T ) −1 Φ T implementation
  • 24. LASSO REGRESSION E(w) = ||Φw − t| + |w| 1 2 | 2 λ 2 ∑M−1 j=1 ※ called L1 regularization term|w|
  • 25. LASSO REGRESSION not be solved analytically (nondifferentiable) solved by coordinate descent perform variable selection(some of the parameters to 0) implementation
  • 26. GENERALIZATION general of the redge and lasso E(w) = ||Φt − t| + |w 1 2 | 2 λ 2 ∑M−1 j=1 | q
  • 27. RE-EXPRESSION is equal to s.t. ( ||Φw − t| + |w ) minw 1 2 | 2 λ 2 ∑M j=1 | q ||Φw − t|minw 1 2 | 2 | ≤ η∑M j=1 wj | q ※ is calculated from lagrange multiplier methodη
  • 28. IMAGE le : redge regression right : lasso regression
  • 29. PROOF s.t.( ϕ( ) −minw 1 2 ∑N i=1 w T x (i) t (i) ) 2 | ≤ η∑M j=1 wj | q by using the Lagrange multiplier method L(w, λ) = ( ϕ( ) − + ( | − η) 1 2 ∑N i=1 w T x (i) t (i) ) 2 λ 2 ∑M j=1 wj | q by using KKT conditions = | − η∂L(w,λ) ∂λ ∑M j=1 wj | q ∴ | = η∑M j=1 w ∗ j | q
  • 30. MAXIMIZE LIKELIHOOD AND SUM-OF-SQUARES ERROR we think is represented by sum of and Gaussian noiset y(x, w) t = y(x, w) + ε
  • 31. MAXIMIZE LIKELIHOOD AND SUM-OF-SQUARES ERROR p(t|x, w, β) = N(t|y(x, w), )β−1 N(x|μ, ) = exp ( − (x − μ ) σ2 1 (2πσ2 ) 1/2 1 2σ2 ) 2
  • 32. LIKELIHOOD p(t|x, w, β) = N( | ϕ( ), )∏N n=1 tn w T xn β−1 ※ t = ( , , …,t1 t2 tN ) T
  • 33. MAXIMIZE LIKELIHOOD ln p(t|x, w, β) = ln N( | ϕ( ), )∑N n=1 tn w T xn β−1 ln p(t|x, w, β) = − ( − ϕ( ) + ln β − ln(2 β 2 ∑N n=1 tn w T xn ) 2 N 2 N 2 therefore, maximize likelifood is equal to minimize sum-of-squares error
  • 34. FORMULA DEFORMATION p(t|X, w, β) = N( | ϕ( ), )∏N n=1 tn w T xn β−1 p(t|X, w, β) = ( exp ( − ( ϕ( ) − ) ∏N n=1 β 2π ) 1/2 β 2 w T x (i) t (i) ) 2 ln p(t|X, w, β) = − ( − ϕ( ) + ln β − ln( β 2 ∑N n=1 tn w T xn ) 2 N 2 N 2
  • 35. MAP ESTIMATION AND L2 REGULARIZATION we add the prior distribution. by using the Bayes theorem p(w|x, t, α, β) ∝ p(t|x, w, β)p(w|α)
  • 36. PRIOR DISTRIBUTION Given this prior distribution because calculation is easy. p(w|α) = N(w|0, I) = ( exp(− w)α−1 α 2π ) (M+1)/2 α 2 w T
  • 37. MAP ESTIMATION is equal to p(w|x, t, α, β)maxw ( ||Φw − t| + ||w| )minw 1 2 | 2 λ 2 | 2 p(w|x, t, α, β) ∝ p(t|x, w, β)p(w|α) p(t|x, w, β) = N( | ϕ( ), )∏N n=1 tn w T xn β−1 p(w|α) = N(w|0, I) = ( exp(− w)α−1 α 2π ) (M+1)/2 α 2 w T so, MAP estimation is equal to redge regression
  • 38. FORMULA DEFORMATION p(t|x, w, α, β) = (( exp ( − ( ϕ( ) − ))( exp(− w)∏N n=1 β 2π ) 1/2 β 2 w T x (i) t (i) ) 2 α 2π ) (M+1)/2 α 2 w T ln p(t|x, w, β) = − ( − ϕ( ) + ln β − ln(2π) + ln α − ln(2π) − β 2 ∑N n=1 tn w T xn ) 2 N 2 N 2 M+1 2 M+1 2 α 2 w T
  • 39. SUMMARY OF LINEAR REGRESSION I introduced some of the linear regression model I showed maximize likelifood is equal to minimize sum-of- squares error I showed MAP estimation is equal to redge regression
  • 40. LINEAR CLASSIFICATION we think K-class (K>2) classification so, we prepare K linear models (x) = ϕ(x)yk w T k y(x) = xW˜ T = ( , , …, )W˜ w1 w2 wK y(x) = ( (x), (x), …, (x)y1 y2 yK ) T
  • 41. 1-OF-K CODING we prepare vector p (if is class ) (if is not class ) p = ( , , …,c1 c2 cK ) T = 1ci x i = 0ci x i ex) p = (0, 0, …, 0, 1, 0, …, 0) T
  • 42. E( )W˜ ∂E( W)˜ ∂W˜ = ||y( ) − | 1 2 ∑ i=1 N x (i) p (i) | 2 = || ϕ( ) − | 1 2 ∑ i=1 N W˜ T x (i) p (i) | 2 = ( ϕ( ) − ( ϕ( ) − ) 1 2 ∑ i=1 N W˜ T x (i) p (i) ) T W˜ T x (i) p (i) = ( ϕ( ϕ( ) − 2ϕ( + || || ) 1 2 ∑ i=1 N x (i) ) T W˜W˜ x (i) x (i) ) T W˜p (i) p (i) = ( ϕ( )ϕ( − ϕ( ) )∑ i=1 N x (i) x (i) ) T W˜ x (i) p (i) T LEAST-SQUARES METHOD
  • 43. LEAST-SQUARES METHOD by = X − P ∂E( )W˜ ∂W˜ X T W˜ X T = 0 ∂E( )W˜ ∂W˜ X = PX T W˜ X T ∴ = ( X PW˜ X T ) −1 X T X P = (ϕ( ), ϕ( ), …, ϕ( )x (1) x (2) x (N) ) T = ( , , …, )p (i) p (2) p (N)
  • 44. WEAK TO OUTLIERS red : least square method green : logistic regression
  • 45. ASSUMING A NORMAL DISTRIBUTION le : least square method right : logistic regression
  • 46. DISADVANTAGES OF LEAST-SQUARES METHOD not handle the label as the probability weak to outliers assuming a normal distribution if data does not follow the normal distribution, bad result we shoud not use least-squares method in classification problem
  • 47. SUMMARY I introduced linear model (regression, classification) there are the basis for some of other machine learning model PRML is difficult for me, but I want to continue reading