Dr azimifar pattern recognition lect4

Introduction
Generative Learning vs Discriminative Learning
Linear Discriminant Analysis
Quadratic Discriminant Analysis
GLAD and QDA, Another point of view
Naive Bayes
Lecture Summary
Statistical Pattern Recognition
Lecture4
Bayesian Learning
Dr Zohreh Azimifar
School of Electrical and Computer Engineering
Shiraz University
Fall2014
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture4 Bayesian Learning 1 / 22

Introduction
Naive Bayes
Lecture Summary
Table of contents
1 Introduction
2 Generative Learning vs Discriminative Learning
Generative Learning and Discriminative Learning
3 Linear Discriminant Analysis
Gaussian Linear Discriminant Analysis
Boundary Decision for GLDA
Analysis of GLDA
4 Quadratic Discriminant Analysis
Analysis of QDA
5 GLAD and QDA, Another point of view
6 Naive Bayes
Naive Bayes
Naive Bayes: An Example
7 Lecture Summary
Summary

Introduction
Naive Bayes
Lecture Summary
Introduction
Classification based on the theory Bayesian Learning
P(y = 0|X) ≷y=0
y=1 P(y = 1|X)
Classification involves determining P(y|X), from different
perspectives.

Introduction
Naive Bayes
Lecture Summary
1 Discriminative Learning:
Direct learning of P(y|X).
Modelling of decision boundary, to which side a new sample is
assigned.
Logistic and softmax regression are called discriminative learners.
2 Generative Learning
Explicit modelling of each class separately.
Compare new sample with each class probability, based on Bayesian
rule:
P(y|X) =
Likelihood
z }| {
P(X|y)
Prior
z }| {
P(y)
P(X)
| {z }
normalizing factor

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
Model P(y) and P(X|y) for each class y:
P(y) = φ
1{y=1}
1 φ
1{y=2}
2 · · · φ1{y=c}
c
P(X|y = i) =
1
(
√
2π)n|Σ|
1
2
exp(
−1
2
(X − µi )T
Σ−1
(X − µi ))
Parameter set: θ = {φ1, φ2, ..., φc , µ1, µ2, ..., µc , Σ}

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
Estimate parameters: θ = {φ1, φ2, ..., φc , µ1, µ2, ..., µc , Σ}
l(θ) = log
m
Y
j=1
P(X(j)
, y(j)
) = log
m
Y
j=1
P(X(j)
|P(y(j)
))P(y(j)
)

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
l(θ) = log
m
Y
j=1
P(X(j)
, y(j)
) = log
m
Y
j=1
P(X(j)
|P(y(j)
))P(y(j)
)
Take partial derivative in terms of each individual parameter:
φMLE
i =
Pm
j=1 1{y(j)
= i}
m

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
l(θ) = log
m
Y
j=1
P(X(j)
, y(j)
) = log
m
Y
j=1
P(X(j)
|P(y(j)
))P(y(j)
)
φMLE
i =
Pm
j=1 1{y(j)
= i}
m
µMLE
i =
Pm
j=1 1{y(j)
= i}X(j)
Pm
j=1 1{y(j) = i}

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
l(θ) = log
m
Y
j=1
P(X(j)
, y(j)
) = log
m
Y
j=1
P(X(j)
|P(y(j)
))P(y(j)
)
φMLE
i =
Pm
j=1 1{y(j)
= i}
m
µMLE
i =
Pm
j=1 1{y(j)
= i}X(j)
Pm
j=1 1{y(j) = i}
ΣMLE
=
1
m
m
X
j=1
(X(j)
− µy(j) )(X(j)
− µy(j) )T

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
Determine class label of a new sample Xnew
:
ynew
= argmaxy P(y|X)
= argmaxy
P(X|y)P(y)
P(X)
= argmaxy P(X|y)P(y)
Note that P(X|y) is a class dependent density.

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
Decision boundary is a line, a plane, or a hyper-plane. Why?
P(y = i|X) = P(y = j|X)
P(X|y = i)P(y = i)
P(X)
=
P(X|y = j)P(y = j)
P(X)

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
Decision boundary cont’ed:
exp(
−1
2
(X − µi )T
Σ−1
(X − µi ))P(y = i) = exp(
−1
2
(X − µj )T
Σ−1
(X − µj ))P(y = j)

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
exp(
−1
2
(X − µi )T
Σ−1
(X − µi ))P(y = i) = exp(
−1
2
(X − µj )T
Σ−1
(X − µj ))P(y = j)
−1
2
(X − µi )T
Σ−1
(X − µi ) + log P(y = i) =
−1
2
(X − µj )T
Σ−1
(X − µj ) + log P(y = j)

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
exp(
−1
2
(X − µi )T
Σ−1
(X − µi ))P(y = i) = exp(
−1
2
(X − µj )T
Σ−1
(X − µj ))P(y = j)
−1
2
(X − µi )T
Σ−1
(X − µi ) + log P(y = i) =
−1
2
(X − µj )T
Σ−1
(X − µj ) + log P(y = j)
⇒ log
P(y = i)
P(y = j)
−
1
2
[XT
Σ−1
X − 2XT
Σ−1
µi + µT
i Σ−1
µi ]
+
1
2
[XT
Σ−1
X − 2XT
Σ−1
µj + µT
j Σ−1
µj ] = 0

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
exp(
−1
2
(X − µi )T
Σ−1
(X − µi ))P(y = i) = exp(
−1
2
(X − µj )T
Σ−1
(X − µj ))P(y = j)
−1
2
(X − µi )T
Σ−1
(X − µi ) + log P(y = i) =
−1
2
(X − µj )T
Σ−1
(X − µj ) + log P(y = j)
⇒ log
P(y = i)
P(y = j)
−
1
2
[XT
Σ−1
X − 2XT
Σ−1
µi + µT
i Σ−1
µi ]
+
1
2
[XT
Σ−1
X − 2XT
Σ−1
µj + µT
j Σ−1
µj ] = 0
⇒ XT
Σ−1
(µi − µj )
| {z }
aX
+
1
2
µT
i Σ−1
µi −
1
2
µT
j Σ−1
µj + log
P(y = i)
P(y = j)
| {z }
b
= 0

Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
Analysis of GLDA when Σ = σ2
I
Classes are of identical distribution, but different means.
Cross-section of classes distribution is spherical.
Decision boundary is linear.
Called classifier with nearest Euclidean distance to the class mean,
when?
Σ =

1 0
0 1


Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
Analysis of GLDA when Σ is not identity
Cross-section of classes distribution is ellipsoidal.
Decision boundary is linear.
Σ =

0.5 0
0 1.5


Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
Analysis of GLDA with arbitrary Σ
Cross-section of classes distribution is ellipsoidal. Linear decision
boundary.
Classes are aligned with direction of covariance eigenvectors.
Σ =

0.5 0.2
0.2 1.5


Introduction
Naive Bayes
Lecture Summary
Analysis of GLDA
Analysis of GLDA with arbitrary Σ
Cross-section of classes distribution is ellipsoidal. Linear decision
boundary.
Classes are aligned with direction of covariance eigenvectors.
Σ =

0.5 0.2
0.2 1.5

Called classifier with nearest Mahalanobis distance to the class
mean, when? Dist(X, µi) = (X − µi)T
Σ−1
(X − µi)

Introduction
Naive Bayes
Lecture Summary
Analysis of QDA
Another generative learning model; a Bayesian classifier
Here, classes are of multinomial distribution, and likelihoods are
multivariate Gaussian with separate covariance Σi .
Decision boundary becomes non-linear, Why?
P(X|y = i) =
1
(
√
2π)n|Σi |
1
2
exp(
−1
2
(X − µi )T
Σ−1
i (X − µi ))

Introduction
Naive Bayes
Lecture Summary
Analysis of QDA
1
|Σi |
1
2
exp(
−1
2
(X − µi )T
Σ−1
i (X − µi ))P(y = i)
=
1
|Σj |
1
2
exp(
−1
2
(X − µj )T
Σ−1
j (X − µj ))P(y = j)

Introduction
Naive Bayes
Lecture Summary
Analysis of QDA
1
|Σi |
1
2
exp(
−1
2
(X − µi )T
Σ−1
i (X − µi ))P(y = i)
=
1
|Σj |
1
2
exp(
−1
2
(X − µj )T
Σ−1
j (X − µj ))P(y = j)
−1
2
log|Σi | −
1
2
(X − µi )T
Σ−1
i (X − µi ) + logP(y = i)
=
−1
2
log|Σj | −
1
2
(X − µj )T
Σ−1
j (X − µj ) + logP(y = j)

Introduction
Naive Bayes
Lecture Summary
Analysis of QDA
1
|Σi |
1
2
exp(
−1
2
(X − µi )T
Σ−1
i (X − µi ))P(y = i)
=
1
|Σj |
1
2
exp(
−1
2
(X − µj )T
Σ−1
j (X − µj ))P(y = j)
−1
2
log|Σi | −
1
2
(X − µi )T
Σ−1
i (X − µi ) + logP(y = i)
=
−1
2
log|Σj | −
1
2
(X − µj )T
Σ−1
j (X − µj ) + logP(y = j)
⇒ log
P(y = i)
P(y = j)
−
1
2
log
|Σi |
|Σj |
−
1
2
[XT
Σ−1
i X + µT
i Σ−1
i µi −
2XT
Σ−1
i µi − XT
Σ−1
j X − µT
j Σ−1
j µj + 2XT
Σ−1
j µj ] = 0

Introduction
Naive Bayes
Lecture Summary
Analysis of QDA
log
P(y = i)
P(y = j)
−
1
2
log

Dr azimifar pattern recognition lect4

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to Dr azimifar pattern recognition lect4

Similar to Dr azimifar pattern recognition lect4 (19)

More from Zahra Amini

More from Zahra Amini (12)

Recently uploaded

Recently uploaded (20)

Dr azimifar pattern recognition lect4