SlideShare une entreprise Scribd logo
Introduction + Chapter 1
Reviewer : Sunwoo Kim
Christopher M. Bishop
Pattern Recognition and Machine Learning
Yonsei University
Department of Applied Statistics
1
Study Introduction
Study Objective
- Assuming that we are familiar of mathematical statistics 1&2 + regression analysis + basic bayes,
- Getting the intuition of the various algorithms.
- Understanding mathematical concepts of the algorithms.
- Reviewing the algorithms on the statistical perspective.
Time
- Fixed time TBD
Method
- A week before, we choose the scope of the sessions.
- I will prepare the summary of the scope.
- Every participants should study the scope and prepare some related questions!
2
Notation
𝑦 𝑥, 𝑤 : Estimated value with parameter w (which is y)
𝑡𝑛 : True value (which is 𝑦)
𝒕 ∶ Set of input and output vectors
𝐸 𝑤 = 𝐿(𝑦 𝑥, 𝑤 , 𝑡) : Error function, which measures the misfit between estimated value and the true value.
𝒘 = 𝒘𝑻𝒘 = 𝑤1
2
+ 𝑤2
2
+ ⋯ + 𝑤𝑛
2
= Euclidean norm (also called l2-norm)
𝝁, 𝚺, |𝚺| : Mean, covariance and determinant of variables.
𝛽 = Σ−1
(𝑓𝑜𝑟 𝑢𝑛𝑖𝑣𝑎𝑟𝑖𝑎𝑡𝑒 =
1
𝜎2) : Precision parameter (inverse of covariance)
𝜇𝑀𝐿 : Estimated mean by maximum likelihood estimation.
3
Chapter 1.1. Polynomial Curve Fitting
We have already covered most of the sections in chapter 1 in our undergraduate classes.
Thus, I would like to cover only the concepts which are unfamiliar to us.
4
Most of our regression model focuses on simple linear regression, that is
𝛽 = 𝑋𝑇
𝑋 −1
𝑋𝑇
𝑌
Above estimation could be achieved via normal equation.
However, how can we set example like this?
Here we construct model by using polynomial variables!
We can still apply squared error!
Chapter 1.2.6 Bayesian Curve Fitting
As we all know, we need to assume the distribution of parameters.
Furthermore, we have to marginalize it out in order to make prediction!
This process can be expressed by
5
This entire process will be covered in detail in chapter 3!
Chapter 1.5. Decision Theory
Our Goal : Getting the 𝑝(𝑥, 𝑡), but in most case it is extremely hard.
In fact, we estimate posterior, 𝑝 𝑡 𝑥 = 𝑝 𝐶𝑘 𝑥 =
𝑝 𝑥 𝐶𝑘 𝒑 𝑪𝒌
𝑝(𝑥)
6
For cancer diagnosis, we have some belief,
A prior knowledge before taking X-ray.
Consider we are trying to build a decision rule.
For binary classification, we are dividing input space to 𝓡𝟏 & 𝓡𝟐.
What we do in ML is “minimizing the misclassification rate”.
Here, let’s consider the decision boundary to be 𝑥.
Optimal boundary will be 𝑥 = 𝑥0
Chapter 1.5. Decision Theory
We need a generalization of the concept “loss”.
Here we define the “loss function”.
𝐿𝑘𝑗 : Element of a loss matrix.
We are minimizing the average loss of the function.
7
In practical, estimating the mere probability is not enough.
We need to assign a specific label! / That is, we need to decide a cut-off
This threshold matter is called “reject option”.
=
Chapter 1.5. Decision Theory
Way of classification
8
(A) Generative Model (B) Discriminative Model (C) Direct classification
- Estimating above probabilities.
- We are modeling the distribution of the
input & output.
- It is possible to generate synthetic data.
- Estimating the posterior only.
- We calculate the probability of our
interest.
𝒑 𝑪𝒌 𝒙) 𝑪𝒌 = 𝒇(𝒙)
- We do not calculate the probability.
- Directly yields the class label.
Chapter 1.6. Information Theory
We are interested in ‘how much information is received when we observe specific event?’
This is something connected to the idea of uncertainty!
Let ℎ(. ) be a function of information gain by observing specific event.
If two events 𝑥 𝑎𝑛𝑑 𝑦 are independent, ℎ 𝑥, 𝑦 = ℎ 𝑥 + ℎ(𝑦) satisfies.
However, unrelated events’ probabilities satisfy… 𝑝 𝑥, 𝑦 = 𝑝 𝑥 𝑝(𝑦)
It is intuitive to use ℎ 𝑥 = − log2 𝑝(𝑥)
What is an average achievement of information? It can be written as
9
Check how the entropy values change as the probability changes
Chapter 1.6. Information Theory
Ideation.
Consider the random variable which may have 8 possible states.
1st Case : All same probabilities
2nd Case : Probability of (
1
2
,
1
4
,
1
8
,
1
16
,
1
64
,
1
64
,
1
64
,
1
64
)
See how the information gain is changing.
At the same time, we can define entropy as ‘average amount of information needed to specify the state of a random variable.’
Now, consider multinomial distribution.
10
This can be interpreted as multi-version of 𝑛
𝑘
.
Similarly, we are assigning each value to the different boxes.
Chapter 1.6. Information Theory
Let’s take a deeper look at this equation. We are interested in how much information we need to achieve certain state.
Thus, again we apply it in entropy with scale value N.
By applying Stirling’s approximation…
Then, when does this entropy is being maximized?
We can optimize this by solving
Here, maximized value is 𝒑 𝒙𝒊 =
𝟏
𝑴
11
Chapter 1.6. Information Theory
Let’s extend this idea to the continuous variables. By using mean value theorem.
** From wiki, mean value theorem means for the closed interval [a, b], and function 𝑓(𝑥) is continuous in that interval.
Then, for the value c that exists between a and b, following equation satisfies. (Something like 구분구적법)
𝑎
𝑏
𝑓 𝑥 𝑑𝑥 ≅ 𝑓 𝑐 ∗ (𝑏 − 𝑎)
12
We may simply extend it to
Obviously, the interval Δ should be as small as possible to increase the accuracy of approximation, we consider Δ → 0
We can express continuous variable’s entropy in the discrete form.
Chapter 1.6. Information Theory
Let’s again maximize this equation by using lagrangian multiplier.
13
Constraint of basic probability distribution.
We are maximizing…
We can set above equation by zero, and get
And again, by doing some math (solving lagrangian issues, then we get)
Oh… This is amazing…
A probability distribution which gives maximum entropy is a gaussian distribution!!
Chapter 1.6. Information Theory
Kullback-Leibler divergence(KL Divergence)
14
We all have heard of KL divergence for many times. But, what does it exactly indicates??
Let’s think of variable 𝑥 of probability distribution 𝑝(𝑥). We are trying to model this by using 𝑞(𝑥).
It is the average additional amount of information required to specify the value of x as a result of using 𝑞(𝑥) instead of true
distribution 𝑝(𝑥).
In short, it indicates ‘How much information do we need more?’
Original entropy
New estimated
entropy
We have covered the KL-divergence’s inequality in Mathematical Statistics I, by using Jensen inequality.
Note that,
Chapter 1.6. Information Theory
KL divergence in ML
15
= 𝐸𝑥[ln
𝑞 𝑥
𝑝(𝑥)
]
By using the fact of
Now, let’s think of data x which has an unknown distribution of 𝑝(𝑥).
We are trying to model the distribution of 𝑝(𝑥) by using 𝑞(𝑥𝑛|𝜃).
If distribution of 𝑞(𝑥𝑛|𝜃) is similar to 𝑝(𝑥), then its KL divergence is relatively small.
Here, ln 𝑝(𝑥𝑛) does not depend on 𝜃, which is already fixed. Thus, we don’t need second term. Thus, we only need
𝑛=1
𝑁
{− ln 𝑞(𝑋𝑛|𝜃)}, which is related to the negative loge likelihood!

Contenu connexe

Tendances

Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
Massimiliano Patacchiola
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine Learning
Paxcel Technologies
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Kien Le
 
PRML Chapter 12
PRML Chapter 12PRML Chapter 12
PRML Chapter 12
Sunwoo Kim
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
mustafa aadel
 
Confusion Matrix
Confusion MatrixConfusion Matrix
Confusion Matrix
Rajat Gupta
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptx
Sangmin Woo
 
Associative memory network
Associative memory networkAssociative memory network
Associative memory network
Dr. C.V. Suresh Babu
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Sanghyuk Chun
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
Rohit Kumar
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
Tatsuya Yokota
 
Uncertainty in Deep Learning
Uncertainty in Deep LearningUncertainty in Deep Learning
Uncertainty in Deep Learning
Roberto Pereira Silveira
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extraction
skylian
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Sangwoo Mo
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
Functional Imperative
 
Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)
Marina Santini
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
Pavithra Thippanaik
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AI
Florian Wilhelm
 

Tendances (20)

Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine Learning
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
PRML Chapter 12
PRML Chapter 12PRML Chapter 12
PRML Chapter 12
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Confusion Matrix
Confusion MatrixConfusion Matrix
Confusion Matrix
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptx
 
Associative memory network
Associative memory networkAssociative memory network
Associative memory network
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
 
Uncertainty in Deep Learning
Uncertainty in Deep LearningUncertainty in Deep Learning
Uncertainty in Deep Learning
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extraction
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AI
 

Similaire à PRML Chapter 1

PRML Chapter 8
PRML Chapter 8PRML Chapter 8
PRML Chapter 8
Sunwoo Kim
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4
Sunwoo Kim
 
Topic 1 __basic_probability_concepts
Topic 1 __basic_probability_conceptsTopic 1 __basic_probability_concepts
Topic 1 __basic_probability_concepts
Maleakhi Agung Wijaya
 
Fuzzy portfolio optimization_Yuxiang Ou
Fuzzy portfolio optimization_Yuxiang OuFuzzy portfolio optimization_Yuxiang Ou
Fuzzy portfolio optimization_Yuxiang Ou
Yuxiang Ou
 
A Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete ProblemsA Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete Problems
Brittany Allen
 
Line of best fit lesson
Line of best fit lessonLine of best fit lesson
Line of best fit lesson
ReneeTorres11
 
Frequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountFrequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John Mount
Chester Chen
 
PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9
Sunwoo Kim
 
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
ijscmcj
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
preetikumara
 
Large Deviations: An Introduction
Large Deviations: An IntroductionLarge Deviations: An Introduction
Large Deviations: An Introduction
Horacio González Duhart
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
pujashri1975
 
Zain 333343
Zain 333343Zain 333343
Zain 333343
ishfaq143
 
Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)
Adrian Aley
 
PRML Chapter 10
PRML Chapter 10PRML Chapter 10
PRML Chapter 10
Sunwoo Kim
 
Montecarlophd
MontecarlophdMontecarlophd
Montecarlophd
Marco Delogu
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
ShayanChowdary
 
DIGITAL TEXT BOOK
DIGITAL TEXT BOOKDIGITAL TEXT BOOK
DIGITAL TEXT BOOK
bintu55
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
akashayosha
 
Class9_PCA_final.ppt
Class9_PCA_final.pptClass9_PCA_final.ppt
Class9_PCA_final.ppt
MaTruongThanh002937
 

Similaire à PRML Chapter 1 (20)

PRML Chapter 8
PRML Chapter 8PRML Chapter 8
PRML Chapter 8
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4
 
Topic 1 __basic_probability_concepts
Topic 1 __basic_probability_conceptsTopic 1 __basic_probability_concepts
Topic 1 __basic_probability_concepts
 
Fuzzy portfolio optimization_Yuxiang Ou
Fuzzy portfolio optimization_Yuxiang OuFuzzy portfolio optimization_Yuxiang Ou
Fuzzy portfolio optimization_Yuxiang Ou
 
A Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete ProblemsA Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete Problems
 
Line of best fit lesson
Line of best fit lessonLine of best fit lesson
Line of best fit lesson
 
Frequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountFrequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John Mount
 
PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9
 
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Large Deviations: An Introduction
Large Deviations: An IntroductionLarge Deviations: An Introduction
Large Deviations: An Introduction
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
Zain 333343
Zain 333343Zain 333343
Zain 333343
 
Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)
 
PRML Chapter 10
PRML Chapter 10PRML Chapter 10
PRML Chapter 10
 
Montecarlophd
MontecarlophdMontecarlophd
Montecarlophd
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
DIGITAL TEXT BOOK
DIGITAL TEXT BOOKDIGITAL TEXT BOOK
DIGITAL TEXT BOOK
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
Class9_PCA_final.ppt
Class9_PCA_final.pptClass9_PCA_final.ppt
Class9_PCA_final.ppt
 

Dernier

Welcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what yourWelcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what your
Virni Arrora
 
transgenders community data in india by govt
transgenders community data in india by govttransgenders community data in india by govt
transgenders community data in india by govt
palanisamyiiiier
 
Seamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send MoneySeamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send Money
gargtinna79
 
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured DataFine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
kevig
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
dizzycaye
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
kinni singh$A17
 
the unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithmthe unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithm
huseindihon
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
MinThetLwin1
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
janvikumar4133
 
Nipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma TranscriptNipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma Transcript
zyqedad
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata
 
Universidad de Valladolid degree offer diploma Transcript
Universidad de Valladolid  degree offer diploma TranscriptUniversidad de Valladolid  degree offer diploma Transcript
Universidad de Valladolid degree offer diploma Transcript
taqyea
 
Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
LINAT
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
Artificial Intelligence (AI) Technology Project Proposal _ by Slidesgo.pptx
Artificial Intelligence (AI) Technology Project Proposal _ by Slidesgo.pptxArtificial Intelligence (AI) Technology Project Proposal _ by Slidesgo.pptx
Artificial Intelligence (AI) Technology Project Proposal _ by Slidesgo.pptx
vaishnavisharma877623
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
GaneshGanesh399816
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
uapta
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
sharonblush
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
gargnatasha985
 
Harendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting PortfolioHarendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting Portfolio
harendmgr
 

Dernier (20)

Welcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what yourWelcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what your
 
transgenders community data in india by govt
transgenders community data in india by govttransgenders community data in india by govt
transgenders community data in india by govt
 
Seamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send MoneySeamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send Money
 
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured DataFine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
 
the unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithmthe unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithm
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
 
Nipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma TranscriptNipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma Transcript
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
 
Universidad de Valladolid degree offer diploma Transcript
Universidad de Valladolid  degree offer diploma TranscriptUniversidad de Valladolid  degree offer diploma Transcript
Universidad de Valladolid degree offer diploma Transcript
 
Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
Artificial Intelligence (AI) Technology Project Proposal _ by Slidesgo.pptx
Artificial Intelligence (AI) Technology Project Proposal _ by Slidesgo.pptxArtificial Intelligence (AI) Technology Project Proposal _ by Slidesgo.pptx
Artificial Intelligence (AI) Technology Project Proposal _ by Slidesgo.pptx
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Harendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting PortfolioHarendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting Portfolio
 

PRML Chapter 1

  • 1. Introduction + Chapter 1 Reviewer : Sunwoo Kim Christopher M. Bishop Pattern Recognition and Machine Learning Yonsei University Department of Applied Statistics 1
  • 2. Study Introduction Study Objective - Assuming that we are familiar of mathematical statistics 1&2 + regression analysis + basic bayes, - Getting the intuition of the various algorithms. - Understanding mathematical concepts of the algorithms. - Reviewing the algorithms on the statistical perspective. Time - Fixed time TBD Method - A week before, we choose the scope of the sessions. - I will prepare the summary of the scope. - Every participants should study the scope and prepare some related questions! 2
  • 3. Notation 𝑦 𝑥, 𝑤 : Estimated value with parameter w (which is y) 𝑡𝑛 : True value (which is 𝑦) 𝒕 ∶ Set of input and output vectors 𝐸 𝑤 = 𝐿(𝑦 𝑥, 𝑤 , 𝑡) : Error function, which measures the misfit between estimated value and the true value. 𝒘 = 𝒘𝑻𝒘 = 𝑤1 2 + 𝑤2 2 + ⋯ + 𝑤𝑛 2 = Euclidean norm (also called l2-norm) 𝝁, 𝚺, |𝚺| : Mean, covariance and determinant of variables. 𝛽 = Σ−1 (𝑓𝑜𝑟 𝑢𝑛𝑖𝑣𝑎𝑟𝑖𝑎𝑡𝑒 = 1 𝜎2) : Precision parameter (inverse of covariance) 𝜇𝑀𝐿 : Estimated mean by maximum likelihood estimation. 3
  • 4. Chapter 1.1. Polynomial Curve Fitting We have already covered most of the sections in chapter 1 in our undergraduate classes. Thus, I would like to cover only the concepts which are unfamiliar to us. 4 Most of our regression model focuses on simple linear regression, that is 𝛽 = 𝑋𝑇 𝑋 −1 𝑋𝑇 𝑌 Above estimation could be achieved via normal equation. However, how can we set example like this? Here we construct model by using polynomial variables! We can still apply squared error!
  • 5. Chapter 1.2.6 Bayesian Curve Fitting As we all know, we need to assume the distribution of parameters. Furthermore, we have to marginalize it out in order to make prediction! This process can be expressed by 5 This entire process will be covered in detail in chapter 3!
  • 6. Chapter 1.5. Decision Theory Our Goal : Getting the 𝑝(𝑥, 𝑡), but in most case it is extremely hard. In fact, we estimate posterior, 𝑝 𝑡 𝑥 = 𝑝 𝐶𝑘 𝑥 = 𝑝 𝑥 𝐶𝑘 𝒑 𝑪𝒌 𝑝(𝑥) 6 For cancer diagnosis, we have some belief, A prior knowledge before taking X-ray. Consider we are trying to build a decision rule. For binary classification, we are dividing input space to 𝓡𝟏 & 𝓡𝟐. What we do in ML is “minimizing the misclassification rate”. Here, let’s consider the decision boundary to be 𝑥. Optimal boundary will be 𝑥 = 𝑥0
  • 7. Chapter 1.5. Decision Theory We need a generalization of the concept “loss”. Here we define the “loss function”. 𝐿𝑘𝑗 : Element of a loss matrix. We are minimizing the average loss of the function. 7 In practical, estimating the mere probability is not enough. We need to assign a specific label! / That is, we need to decide a cut-off This threshold matter is called “reject option”. =
  • 8. Chapter 1.5. Decision Theory Way of classification 8 (A) Generative Model (B) Discriminative Model (C) Direct classification - Estimating above probabilities. - We are modeling the distribution of the input & output. - It is possible to generate synthetic data. - Estimating the posterior only. - We calculate the probability of our interest. 𝒑 𝑪𝒌 𝒙) 𝑪𝒌 = 𝒇(𝒙) - We do not calculate the probability. - Directly yields the class label.
  • 9. Chapter 1.6. Information Theory We are interested in ‘how much information is received when we observe specific event?’ This is something connected to the idea of uncertainty! Let ℎ(. ) be a function of information gain by observing specific event. If two events 𝑥 𝑎𝑛𝑑 𝑦 are independent, ℎ 𝑥, 𝑦 = ℎ 𝑥 + ℎ(𝑦) satisfies. However, unrelated events’ probabilities satisfy… 𝑝 𝑥, 𝑦 = 𝑝 𝑥 𝑝(𝑦) It is intuitive to use ℎ 𝑥 = − log2 𝑝(𝑥) What is an average achievement of information? It can be written as 9 Check how the entropy values change as the probability changes
  • 10. Chapter 1.6. Information Theory Ideation. Consider the random variable which may have 8 possible states. 1st Case : All same probabilities 2nd Case : Probability of ( 1 2 , 1 4 , 1 8 , 1 16 , 1 64 , 1 64 , 1 64 , 1 64 ) See how the information gain is changing. At the same time, we can define entropy as ‘average amount of information needed to specify the state of a random variable.’ Now, consider multinomial distribution. 10 This can be interpreted as multi-version of 𝑛 𝑘 . Similarly, we are assigning each value to the different boxes.
  • 11. Chapter 1.6. Information Theory Let’s take a deeper look at this equation. We are interested in how much information we need to achieve certain state. Thus, again we apply it in entropy with scale value N. By applying Stirling’s approximation… Then, when does this entropy is being maximized? We can optimize this by solving Here, maximized value is 𝒑 𝒙𝒊 = 𝟏 𝑴 11
  • 12. Chapter 1.6. Information Theory Let’s extend this idea to the continuous variables. By using mean value theorem. ** From wiki, mean value theorem means for the closed interval [a, b], and function 𝑓(𝑥) is continuous in that interval. Then, for the value c that exists between a and b, following equation satisfies. (Something like 구분구적법) 𝑎 𝑏 𝑓 𝑥 𝑑𝑥 ≅ 𝑓 𝑐 ∗ (𝑏 − 𝑎) 12 We may simply extend it to Obviously, the interval Δ should be as small as possible to increase the accuracy of approximation, we consider Δ → 0 We can express continuous variable’s entropy in the discrete form.
  • 13. Chapter 1.6. Information Theory Let’s again maximize this equation by using lagrangian multiplier. 13 Constraint of basic probability distribution. We are maximizing… We can set above equation by zero, and get And again, by doing some math (solving lagrangian issues, then we get) Oh… This is amazing… A probability distribution which gives maximum entropy is a gaussian distribution!!
  • 14. Chapter 1.6. Information Theory Kullback-Leibler divergence(KL Divergence) 14 We all have heard of KL divergence for many times. But, what does it exactly indicates?? Let’s think of variable 𝑥 of probability distribution 𝑝(𝑥). We are trying to model this by using 𝑞(𝑥). It is the average additional amount of information required to specify the value of x as a result of using 𝑞(𝑥) instead of true distribution 𝑝(𝑥). In short, it indicates ‘How much information do we need more?’ Original entropy New estimated entropy We have covered the KL-divergence’s inequality in Mathematical Statistics I, by using Jensen inequality. Note that,
  • 15. Chapter 1.6. Information Theory KL divergence in ML 15 = 𝐸𝑥[ln 𝑞 𝑥 𝑝(𝑥) ] By using the fact of Now, let’s think of data x which has an unknown distribution of 𝑝(𝑥). We are trying to model the distribution of 𝑝(𝑥) by using 𝑞(𝑥𝑛|𝜃). If distribution of 𝑞(𝑥𝑛|𝜃) is similar to 𝑝(𝑥), then its KL divergence is relatively small. Here, ln 𝑝(𝑥𝑛) does not depend on 𝜃, which is already fixed. Thus, we don’t need second term. Thus, we only need 𝑛=1 𝑁 {− ln 𝑞(𝑋𝑛|𝜃)}, which is related to the negative loge likelihood!