SlideShare a Scribd company logo
Soumettre la recherche
Mettre en ligne
S’identifier
S’inscrire
Pattern Recognition and Machine Learning: Section 3.3
Signaler
Yusuke Oda
Suivre
Software Engineer à Google
5 Jun 2013
•
0 j'aime
•
2,728 vues
1
sur
38
Pattern Recognition and Machine Learning: Section 3.3
5 Jun 2013
•
0 j'aime
•
2,728 vues
Télécharger maintenant
Télécharger pour lire hors ligne
Signaler
Formation
Technologie
『パターン認識と機械学習』の輪講で用いた資料。
Yusuke Oda
Suivre
Software Engineer à Google
Recommandé
ベイズ統計入門
Miyoshi Yuya
58.3K vues
•
42 diapositives
[DL輪読会]Inverse Constrained Reinforcement Learning
Deep Learning JP
559 vues
•
14 diapositives
研究室内PRML勉強会 11章2-4節
Koji Matsuda
3.3K vues
•
20 diapositives
強化学習その2
nishio
22.7K vues
•
82 diapositives
変分推論法(変分ベイズ法)(PRML第10章)
Takao Yamanaka
30.2K vues
•
34 diapositives
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
Tomoki Koriyama
300 vues
•
28 diapositives
Contenu connexe
Tendances
PRML第6章「カーネル法」
Keisuke Sugawara
28.3K vues
•
61 diapositives
3D CNNによる人物行動認識の動向
Kensho Hara
23.9K vues
•
23 diapositives
「統計的学習理論」第1章
Kota Matsui
4.2K vues
•
42 diapositives
クラシックな機械学習入門:付録:よく使う線形代数の公式
Hiroshi Nakagawa
17.3K vues
•
9 diapositives
基礎からのベイズ統計学 輪読会資料 第4章 メトロポリス・ヘイスティングス法
Ken'ichi Matsui
25.9K vues
•
87 diapositives
PRML 上 2.3.6 ~ 2.5.2
禎晃 山崎
1.8K vues
•
67 diapositives
Tendances
(20)
PRML第6章「カーネル法」
Keisuke Sugawara
•
28.3K vues
3D CNNによる人物行動認識の動向
Kensho Hara
•
23.9K vues
「統計的学習理論」第1章
Kota Matsui
•
4.2K vues
クラシックな機械学習入門:付録:よく使う線形代数の公式
Hiroshi Nakagawa
•
17.3K vues
基礎からのベイズ統計学 輪読会資料 第4章 メトロポリス・ヘイスティングス法
Ken'ichi Matsui
•
25.9K vues
PRML 上 2.3.6 ~ 2.5.2
禎晃 山崎
•
1.8K vues
8.4 グラフィカルモデルによる推論
sleepy_yoshi
•
6.2K vues
2013.12.26 prml勉強会 線形回帰モデル3.2~3.4
Takeshi Sakaki
•
5.8K vues
パターン認識と機械学習 §6.2 カーネル関数の構成
Prunus 1350
•
14.2K vues
PRML 3.3.3-3.4 ベイズ線形回帰とモデル選択 / Baysian Linear Regression and Model Comparison)
Akihiro Nitta
•
516 vues
【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
ARISE analytics
•
7.7K vues
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
Deep Learning JP
•
2.2K vues
方策勾配型強化学習の基礎と応用
Ryo Iwaki
•
18K vues
不均衡データのクラス分類
Shintaro Fukushima
•
58.8K vues
ベイズ推論とシミュレーション法の基礎
Tomoshige Nakamura
•
20.1K vues
深層学習による非滑らかな関数の推定
Masaaki Imaizumi
•
43.7K vues
強化学習と逆強化学習を組み合わせた模倣学習
Eiji Uchibe
•
25K vues
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
yukihiro domae
•
2.3K vues
2014 3 13(テンソル分解の基礎)
Tatsuya Yokota
•
40.2K vues
[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
Deep Learning JP
•
2.4K vues
En vedette
Neural Machine Translation via Binary Code Prediction
Yusuke Oda
908 vues
•
14 diapositives
Learning Continuous Control Policies by Stochastic Value Gradients
mooopan
4.4K vues
•
27 diapositives
Center loss for Face Recognition
Jisung Kim
5K vues
•
31 diapositives
Muzammil Abdulrahman PPT On Gabor Wavelet Transform (GWT) Based Facial Expres...
Petroleum Training Institute
2.9K vues
•
35 diapositives
Pattern Recognition and Machine Learning : Graphical Models
butest
1.7K vues
•
71 diapositives
DIY Deep Learning with Caffe Workshop
odsc
13.6K vues
•
62 diapositives
En vedette
(20)
Neural Machine Translation via Binary Code Prediction
Yusuke Oda
•
908 vues
Learning Continuous Control Policies by Stochastic Value Gradients
mooopan
•
4.4K vues
Center loss for Face Recognition
Jisung Kim
•
5K vues
Muzammil Abdulrahman PPT On Gabor Wavelet Transform (GWT) Based Facial Expres...
Petroleum Training Institute
•
2.9K vues
Pattern Recognition and Machine Learning : Graphical Models
butest
•
1.7K vues
DIY Deep Learning with Caffe Workshop
odsc
•
13.6K vues
Face Recognition Based on Deep Learning (Yurii Pashchenko Technology Stream)
IT Arena
•
3K vues
портфоліо Бабич О.А.
Сергей Жулавник
•
442 vues
Caffe - A deep learning framework (Ramin Fahimi)
irpycon
•
2.3K vues
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
de:code 2017
•
688 vues
Caffe framework tutorial2
Park Chunduck
•
8.4K vues
Processor, Compiler and Python Programming Language
arumdapta98
•
858 vues
Semi fragile watermarking
Yash Diwakar
•
514 vues
Using Gradient Descent for Optimization and Learning
Dr. Volkan OBAN
•
1.6K vues
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
BAINIDA
•
3.6K vues
Caffe framework tutorial
Park Chunduck
•
2.7K vues
Structure Learning of Bayesian Networks with p Nodes from n Samples when n<...
Joe Suzuki
•
558 vues
Facebook Deep face
Emanuele Santellani
•
4.7K vues
Optimization in deep learning
Jeremy Nixon
•
767 vues
Computer vision, machine, and deep learning
Igi Ardiyanto
•
2.3K vues
Similaire à Pattern Recognition and Machine Learning: Section 3.3
Relevance Vector Machines for Earthquake Response Spectra
drboon
454 vues
•
15 diapositives
Relevance Vector Machines for Earthquake Response Spectra
drboon
198 vues
•
15 diapositives
PRML Chapter 9
Sunwoo Kim
213 vues
•
22 diapositives
Statistical Clustering
tim_hare
9.2K vues
•
37 diapositives
DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...
ijma
185 vues
•
17 diapositives
Paper 6 (azam zaka)
Nadeem Shafique Butt
221 vues
•
12 diapositives
Similaire à Pattern Recognition and Machine Learning: Section 3.3
(20)
Relevance Vector Machines for Earthquake Response Spectra
drboon
•
454 vues
Relevance Vector Machines for Earthquake Response Spectra
drboon
•
198 vues
PRML Chapter 9
Sunwoo Kim
•
213 vues
Statistical Clustering
tim_hare
•
9.2K vues
DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...
ijma
•
185 vues
Paper 6 (azam zaka)
Nadeem Shafique Butt
•
221 vues
Optimum capacity allocation of distributed generation units using parallel ps...
eSAT Journals
•
165 vues
A modified pso based graph cut algorithm for the selection of optimal regular...
IAEME Publication
•
320 vues
linkd
Aprameyo Roy
•
100 vues
Genetic algorithms
swapnac12
•
185 vues
40120140507002
IAEME Publication
•
116 vues
40120140507002
IAEME Publication
•
178 vues
H235055
inventionjournals
•
225 vues
Measuring Robustness on Generalized Gaussian Distribution
IJERA Editor
•
122 vues
FEM 10 Common Errors.ppt
Praveen Kumar
•
26 vues
Abrigo and love_2015_
Murtaza Khan
•
303 vues
Evaluating competing predictive distributions
Andreas Collett
•
205 vues
level set method
Collin Jasnoch
•
267 vues
Instance based learning
swapnac12
•
216 vues
Fool me twice
Vishesh Gupta
•
381 vues
Plus de Yusuke Oda
primitiv: Neural Network Toolkit
Yusuke Oda
16.3K vues
•
97 diapositives
ChainerによるRNN翻訳モデルの実装+@
Yusuke Oda
9.5K vues
•
22 diapositives
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
Yusuke Oda
2.6K vues
•
21 diapositives
Encoder-decoder 翻訳 (TISハンズオン資料)
Yusuke Oda
10.7K vues
•
16 diapositives
Learning to Generate Pseudo-code from Source Code using Statistical Machine T...
Yusuke Oda
3.4K vues
•
27 diapositives
A Chainer MeetUp Talk
Yusuke Oda
83.5K vues
•
8 diapositives
Plus de Yusuke Oda
(12)
primitiv: Neural Network Toolkit
Yusuke Oda
•
16.3K vues
ChainerによるRNN翻訳モデルの実装+@
Yusuke Oda
•
9.5K vues
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
Yusuke Oda
•
2.6K vues
Encoder-decoder 翻訳 (TISハンズオン資料)
Yusuke Oda
•
10.7K vues
Learning to Generate Pseudo-code from Source Code using Statistical Machine T...
Yusuke Oda
•
3.4K vues
A Chainer MeetUp Talk
Yusuke Oda
•
83.5K vues
PCFG構文解析法
Yusuke Oda
•
5.7K vues
Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic ...
Yusuke Oda
•
1.2K vues
ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...
Yusuke Oda
•
2K vues
Tree-based Translation Models (『機械翻訳』§6.2-6.3)
Yusuke Oda
•
1.9K vues
翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)
Yusuke Oda
•
1.4K vues
Test
Yusuke Oda
•
1.1K vues
Dernier
UH 1.docx
nisrinamadani2
229 vues
•
6 diapositives
Elevating Food Safety:Tackling Hazards for a Stronger Food Safety Culture
SafetyChain Software
136 vues
•
23 diapositives
Advertising Strategy Lecture 1 2023.pptx
Neil Kelley
173 vues
•
26 diapositives
Temenos_Global_Report_Final_Sep22.pdf
Chris Skinner
474 vues
•
12 diapositives
Most simple IFS Model.pptx
BappaChowdhury2
43 vues
•
9 diapositives
Monthly Information Session for MV Asterix (September 2023) - Web.pptx
Esquimalt MFRC
133 vues
•
31 diapositives
Dernier
(20)
UH 1.docx
nisrinamadani2
•
229 vues
Elevating Food Safety:Tackling Hazards for a Stronger Food Safety Culture
SafetyChain Software
•
136 vues
Advertising Strategy Lecture 1 2023.pptx
Neil Kelley
•
173 vues
Temenos_Global_Report_Final_Sep22.pdf
Chris Skinner
•
474 vues
Most simple IFS Model.pptx
BappaChowdhury2
•
43 vues
Monthly Information Session for MV Asterix (September 2023) - Web.pptx
Esquimalt MFRC
•
133 vues
Introduction of Inorganic Chemistry, History of Pharmacopoeia.pptx
Ms. Pooja Bhandare
•
307 vues
hand washing ppt
Suriya482923
•
77 vues
ACTIVITY BOOK key 0_story.pptx
Mar Caston Palacio
•
205 vues
In vitro.pptx
HimadripriyaGogoi2
•
178 vues
Evropski dan jezika
Ugostiteljskoturisti
•
213 vues
QUESTIONS & ANSWERS.pptx
lourdesduquesa
•
67 vues
ACTIVITY BOOK key 00.pptx
Mar Caston Palacio
•
194 vues
Version Stamps in NOSQL Databases
Dr-Dipali Meher
•
77 vues
Weekly Quiz #5: TV Shows Quiz
SJU Quizzers
•
44 vues
2023-24 UNIT 2 - The Age of Revolutions (PPT).pdf
JaimeAlonsoEdu
•
84 vues
SY Sem 3 Paper 1 Electrochemistry 30sep 23.pdf
Dr. Aqeela Sattar
•
116 vues
cot rating sheets 2023-2024.pdf
merlinda Elcano
•
72 vues
SHEYA DEY - 23COMD18 & 33 & 57 ACCOUNTING WITH DRONES.pptx
Kumarasamy Dr.PK
•
159 vues
9.28.23 The Social Construction of Race.pptx
MaryPotorti1
•
211 vues
Pattern Recognition and Machine Learning: Section 3.3
1.
Reading Pattern Recognition and Machine
Learning §3.3 (Bayesian Linear Regression) Christopher M. Bishop Introduced by: Yusuke Oda (NAIST) @odashi_t 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 1
2.
Agenda 3.3 Bayesian
Linear Regression ベイズ線形回帰 – 3.3.1 Parameter distribution パラメータの分布 – 3.3.2 Predictive distribution 予測分布 – 3.3.3 Equivalent kernel 等価カーネル 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 2
3.
Agenda 3.3 Bayesian
Linear Regression ベイズ線形回帰 – 3.3.1 Parameter distribution パラメータの分布 – 3.3.2 Predictive distribution 予測分布 – 3.3.3 Equivalent kernel 等価カーネル 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 3
4.
Bayesian Linear Regression
Maximum Likelihood (ML) – The number of basis functions (≃ model complexity) depends on the size of the data set. – Adds the regularization term to control model complexity. – How should we determine the coefficient of regularization term? 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 4
5.
Bayesian Linear Regression
Maximum Likelihood (ML) – Using ML to determine the coefficient of regularization term ... Bad selection • This always leads to excessively complex models (= over-fitting) – Using independent hold-out data to determine model complexity (See §1.3) ... Computationally expensive ... Wasteful of valuable data 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 5 In the case of previous slide, λ always becomes 0 when using ML to determine λ.
6.
Bayesian Linear Regression
Bayesian treatment of linear regression – Avoids the over-fitting problem of ML. – Leads to automatic methods of determining model complexity using the training data alone. What we do? – Introduces the prior distribution and likelihood . • Assumes the model parameter as proberbility function. – Calculates the posterior distribution using the Bayes' theorem: 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 6
7.
Agenda 3.3 Bayesian
Linear Regression ベイズ線形回帰 – 3.3.1 Parameter distribution パラメータの分布 – 3.3.2 Predictive distribution 予測分布 – 3.3.3 Equivalent kernel 等価カーネル 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 7
8.
Note: Marginal /
Conditional Gaussians Marginal Gaussian distribution for Conditional Gaussian distribution for given Marginal distribution of Conditional distribution of given 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 8 Given: Then: where
9.
Parameter Distribution Remember
the likelihood function given by §3.1.1: – This is the exponential of quadratic function of The corresponding conjugate prior is given by a Gaussian distribution: 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 9 known parameter
10.
Parameter Distribution Now
given: Then the posterior distribution is shown by using (2.116): where 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 10
11.
Online Learning- Parameter
Distribution If data points arrive sequentially, the design matrix has only 1 row: Assuming that are the n-th input data then we can obtain the formula for online learning: where In addition, 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 11
12.
Easy Gaussian Prior-
Parameter Distribution If the prior distribution is a zero-mean isotropic Gaussian governed by a single precision parameter : The corresponding posterior distribution is also given: where 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 12
13.
Relationship with MSSE-
Parameter Distribution The log of the posterior distribution is given: If prior distribution is given by (3.52), this result is shown: – Maximization of (3.55) with respect to – Minimization of the sum-of-squares error (MSSE) function with the addition of a quadratic regularization term 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 13 Equivalent
14.
Example- Parameter Distribution
Straight-line fitting – Model function: – True function: – Error: – Goal: To recover the values of from such data – Prior distribution: 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 14
15.
Generalized Gaussian Prior-
Parameter Distribution We can generalize the Gaussian prior about exponent. In which corresponds to the Gaussian and only in the case is the prior conjugate to the (3.10). 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 15
16.
Agenda 3.3 Bayesian
Linear Regression ベイズ線形回帰 – 3.3.1 Parameter distribution パラメータの分布 – 3.3.2 Predictive distribution 予測分布 – 3.3.3 Equivalent kernel 等価カーネル 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 16
17.
Predictive Distribution 2013/6/5 2013
© Yusuke Oda AHC-Lab, IS, NAIST 17 Let's consider that making predictions of directly for new values of . In order to obtain it, we need to evaluate the predictive distribution: This formula is tipically written: Marginalization arround (summing out )
18.
Predictive Distribution The
conditional distribution of the target variable is given: And the posterior weight distribution is given: Accordingly, the result of (3.57) is shown by using (2.115): where 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 18
19.
Predictive Distribution Now
we discuss the variance of predictive distribution: – As additional data points are observed, the posterior distribution becomes narrower: – 2nd term of the(3.59) goes zero in the limit : 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 19 Addictive noise goverened by the parameter . This term depends on the mapping vector . of each data point .
20.
Predictive Distribution 2013/6/5 2013
© Yusuke Oda AHC-Lab, IS, NAIST 20
21.
Example- Predictive Distribution
Gaussian regression with sine curve – Basis functions: 9 Gaussian curves 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 21 Mean of predictive distribution Standard deviation of predictive distribution
22.
Example- Predictive Distribution
Gaussian regression with sine curve 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 22
23.
Example- Predictive Distribution
Gaussian regression with sine curve 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 23
24.
Problem of Localized
Basis- Predictive Distribution Polynominal regression Gaussian regression 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 24 Which is better?
25.
Problem of Localized
Basis- Predictive Distribution If we used localized basis function such as Gaussians, then in regions away from the basis function centers the contribution from the 2nd term in the (3.59) will goes zero. Accordingly, the predictive variance becomes only the noise contribution . But it is not good result. 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 25 Large contribution Small contribution
26.
Problem of Localized
Basis- Predictive Distribution This problem (arising from choosing localized basis function) can be avoided by adopting an alternative Bayesian approach to regression known as a Gaussian process. – See §6.4. 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 26
27.
Case of Unknown
Precision- Predictive Distribution If both and are treated as unknown then we can introduce a conjugate prior distribution and corresponding posterior distribution as Gaussian-gamma distribution: And then the predictive distribution is given: 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 27
28.
Agenda 3.3 Bayesian
Linear Regression ベイズ線形回帰 – 3.3.1 Parameter distribution パラメータの分布 – 3.3.2 Predictive distribution 予測分布 – 3.3.3 Equivalent kernel 等価カーネル 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 28
29.
Equivalent Kernel 2013/6/5 2013
© Yusuke Oda AHC-Lab, IS, NAIST 29 If we substitute the posterior mean solution (3.53) into the expression (3.3), the predictive mean can be written: This formula can assume the linear combination of :
30.
Equivalent Kernel Where
the coefficients of each are given: This function is calld smoother matrix or equivalent kernel. Regression functions which make predictions by taking linear combinations of the training set target values are known as linear smoothers. We also predict for new input vector using equivalent kernel, instead of calculating parameters of basis functions. 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 30
31.
Example 1- Equivalent
Kernel Equivalent kernel with Gaussian regression Equivalen kernel depends on the set of basis function and the data set. 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 31
32.
Equivalent Kernel Equivalent
kernel means the contribution of each data point for predictive mean. The covariance between and can be shown by equivalent kernel: 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 32 Large contribution Small contribution
33.
Properties of Equivalent
Kernel- Equivalent Kernel Equivalent kernel have localization property even if any basis functions are not localized. Sum of equivalent kernel equals 1 for all : 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 33 Polynominal Sigmoid
34.
Example 2- Equivalent
Kernel Equivalent kernel with polynominal regression – Moving parameter: 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 34
35.
Example 2- Equivalent
Kernel Equivalent kernel with polynominal regression – Moving parameter: 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 35
36.
Example 2- Equivalent
Kernel Equivalent kernel with polynominal regression – Moving parameter: 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 36
37.
Properties of Equivalent
Kernel- Equivalent Kernel Equivalent kernel satisfies an important property shared by kernel functions in general: – Kernel function can be expressed in the form of an inner product with respect to a vector of nonlinear functions: – In the case of equivalent kernel, is given below: 2013/6/5 2013 © Yusuke Oda AHC-Lab, IS, NAIST 37
38.
Thank you! 2013/6/5 2013
© Yusuke Oda AHC-Lab, IS, NAIST 38 zzz...