An Introduction to Hidden Markov Model

An Introduction to HMM

Browny
2010.07.21

MM vs. HMM

States

States

Observations

Markov Model
• Given 3 weather states:
– {S1, S2, S3} = {rain, cloudy, sunny}
Rain Cloudy Sunny
Rain 0.4 0.3 0.3
Cloudy 0.2 0.6 0.2
Sunny 0.1 0.1 0.8

• What is the probabilities for next 7 days
will be {sun, sun, rain, rain, sun, cloud,
sun} ?

Hidden Markov Model
• The states
– We don’t understand, Hidden!
– But it can be indirectly observed

• Example
– 北極or赤道(model), Hot/Cold(state), 1/2/3
ice cream(observation)

Hidden Markov Model
• The observation is a probability function
of state which is not observable directly

Hidden States

HMM Elements
• N, the number of states in the model
• M, the number of distinct observation
symbols
• A, the state transition probability distribution
• B, the observation symbol probability
distribution in states
• π, the initial state distribution λ: model

Example
P(…|C) P(…|H) P(…|Start)
P(1|…) 0.7 0.1

P(2|…) 0.2 B: 0.2
Observation
P(3|…) 0.1 0.7

P(C|…) 0.8 0.1 0.5

A: π:
P(H|…) 0.1 0.8 0.5
Transition initial

P(STOP|…) 0.1 0.1 0

3 Problems
1. 觀察到的現象最符合哪一個模型
P(觀察到的現象|模型)
2. 怎樣的狀態序列最符合觀察到的現
象和已知的模型
P(狀態序列|觀察到的現象, 模型)
3. 怎樣的模型最有可能產生觀察到的
現象
what 模型 maximize P(觀察到的現象|
模型)

Solution 1
• 已知模型，一觀察序列之產生機率 P(O|λ)
R1 R1 R1
S1 S1 S1
R2 R2 R2

S2 R1 S2 R1 S2 R1
R2 R2 R2

S3 R1 S3 R1 S3 R1
R2 R2 R2

1 2 3 t

觀察到 R1 R1 R2 的機率為多少？

Solution 1
• 考慮一特定的狀態序列
Q = q1, q2 … qT

• 產生出一特定觀察序列之機率為

P(O|Q, λ) = P(O1|q1, λ) * P(O2|q2, λ) * … * P(Ot|qt, λ)

= bq1(O1) * bq2(O2) * … * bqT(OT)

Solution 1
狀態的數量)
• Complexity (N: 狀態的數量
– 2T*NT ≈ (2T-1)NT muls + NT-1 adds (NT:狀態轉換
組合數)
– For N=5 states, T=100 observations, there are
order 2*100*5100 ≈ 1072 computations!!
• Forward Algorithm
– Forward variable αt(i) (給定時間 t 時狀態為 Si 的
條件下，向前向前局部觀察序列為O1, O2, O3…, Ot 的
向前
機率)
at (i ) = P(O1 , O2 ,..., Ot , qt = Si | λ )

Solution 1
R1 R1 R1
S1 S1 S1
R2 R2 R2
When O1 = R1
S2 R1 S2 R1 S2 R1
R2 R2 R2

S3 R1 S3 R1 S3 R1
R2 R2 R2

1 2 3 t

α1 ( i ) = π i bi ( O1 ) 1 ≤ i ≤ N
α 2 (1) = α1 (1) a11 + α1 ( 2 ) a21 + α1 ( 3) a31  b1 ( O2 )
 
α1 (1) = π 1b1 (O1 )
α1 (2) = π 2b2 (O1 ) α 2 ( 2 ) = α1 (1) a12 + α1 ( 2 ) a22 + α1 ( 3) a32  b2 ( O2 )
 
α1 (3) = π 3b3 (O1 )

Forward Algorithm
• Initialization:
α1 (i ) = π i bi (O1 ) 1 ≤ i ≤ N
• Induction:
N  1 ≤ t ≤ T −1
αt +1 ( j ) = ∑αt ( i ) aij  bj ( Ot +1 ) 1 ≤ j ≤ N
 i=1 

• Termination:
N
P(O | λ ) = ∑ αT (i )
i =1

Backward Algorithm
• Forward Algorithm
at (i ) = P(O1 , O2 ,..., Ot , qt = Si | λ )

• Backward Algorithm
– 給定時間 t 時狀態為 Si 的條件下，向後向後局
向後
部觀察序列為 Ot+1, Ot+2, …, OT的機率

βt (i ) = P(Ot +1 , Ot + 2 ,..., OT , qt = Si | λ )

Backward Algorithm
• Initialization
βT (i ) = 1 1 ≤ i ≤ N
• Induction
N
t = T −1, T − 2, ...,1
βt (i ) = ∑ aij b j (Ot +1 ) β t +1 ( j )
j =1 1≤ i ≤ N

Backward Algorithm
R1 R1 R1
S1 S1 S1
R2 R2 R2
When OT = R1
S2 R1 S2 R1 S2 R1
R2 R2 R2

S3 R1 S3 R1 S3 R1
R2 R2 R2

1 2 3 t

N
β T −1 (1) = ∑ a1 j b j ( OT ) β T ( j )
j =1

= a11b1 ( OT ) + a12 b2 ( OT ) + a13b3 ( OT )

Solution 2
• 怎樣的狀態序列最能解釋觀察到的現
象和已知的模型
P(狀態序列|觀察到的現象, 模型)

• 無精確解，有很多種方式解此問題，
對狀態序列的不同限制有不同的解法
對狀態序列的不同限制

Solution 2
• 例: Choose the state qt which are individually
most likely
– γt(i) : the probability of being in state Si at
time t, given the observation sequence O,
and the model λ
P (O | qt = Si , λ ) α t ( i ) βt ( i ) α t ( i ) βt ( i )
γ t (i ) = = = N
P (O λ ) P (O λ )
∑ α t ( i ) βt ( i )
i =1

qt = argmax γ t ( i )  1 ≤ t ≤ T
 
1≤i ≤ N

Viterbi algorithm
• The most widely used criteria is to find
the “single best state sequence”
maxmize P ( Q | O, λ ) ≈ maxmize P ( Q, O | λ )

• A formal technique exists, based on
dynamic programming methods, and is
called the Viterbi algorithm

Viterbi algorithm
• To find the single best state sequence, Q =
{q1, q2, …, qT}, for the given observation
sequence O = {O1, O2, …, OT}

• δt(i): the best score (highest prob.) along a
single path, at time t, which accounts for the
first t observations and end in state Si
δ t ( i ) = max P  q1 q2 ... qt = Si , O1 O2 ... Ot λ 
 
1 q , q ,..., q
2 t −1

Viterbi algorithm
• Initialization - δ1(i)
– When t = 1 the most probable path to a
state does not sensibly exist

– However we use the probability of being in
that state given t = 1 and the observable
state O1
δ1 ( i ) = π i bi ( O1 ) 1 ≤ i ≤ N
ψ (i ) = 0

Viterbi algorithm
• Calculate δt(i) when t > 1
– δt(i) : The most probable path to the state X
at time t
– This path to X will have to pass through one
of the states A, B or C at time (t-1)
Most probable path to A: δ t −1 ( A) a AX bX ( Ot )

Viterbi algorithm
• Recursion
δ t ( j ) = max δ t −1 ( i ) aij  b j ( Ot )
  2≤t ≤T
1≤ i ≤ N

ψ t ( j ) = argmax δ t −1 ( i ) aij  1≤ j ≤ N

1≤ i ≤ N


• Termination
P* = max δ T ( i ) 
 
1≤i ≤ N

q = argmax δ T ( i ) 
*
T  
1≤i ≤ N

Viterbi algorithm
• Path (state sequence) backtracking
qt* = ψ t +1 (qt*+1 ) t = T − 1, T − 2, ..., 1
qT −1 = ψ T (qT ) = argmax δ T −1 ( i ) aiq* 
* *

1≤i ≤ N  T 

...
...
* *
q1 = ψ 2 (q2 )

Solution 3
• 怎樣的模型 λ = (A, B, π) 最有可能產生
觀察到的現象
what 模型 maximize P(觀察到的現象|
模型)
• There is no known analytic solution. We
can choose λ = (A, B, π) such that P(O| λ)
is locally maximized using an iterative
procedure

Baum-Welch Method
• Define ξt(i, j) = P(qt=Si , qt+1=Sj|O, λ)
– The probability of being in state Si at time t,
and state Sj at time t+1

α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
ξt ( i, j ) =
P (O λ )
α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
= N N

∑∑ α ( i ) a b ( O ) β ( j )
i =1 j =1
t ij j t +1 t +1

Baum-Welch Method
• γt(i) : the probability of being in state Si at time
t, given the observation sequence O, and the
model λ
α t ( i ) βt ( i ) α ( i ) βt ( i )
γ t (i ) = = N t
P (O λ )
∑ α t ( i ) βt ( i )
• Relate γt(i) to ξt(i, j) i =1

α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
N ξt ( i, j ) =
γ t ( i ) = ∑ ξt ( i, j ) P (O λ )
j =1 α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
= N N

∑∑ α ( i ) a b ( O ) β ( j )
i =1 j =1
t ij j t +1 t +1

Baum-Welch Method
• The expected number of times that state Si is
visited
T −1

∑ γ ( i ) = Expected number of transitions from Si
t =1
t

• Similarly, the expected number of transitions
from state Si to state Sj
T −1

∑ ξ ( i, j ) = Expected number of transitions from S to S
t =1
t i j

Baum-Welch Method
• Re-estimation formulas for π, A and B
π i = γ1(i)
T −1

∑ξ (i, j)
t =1
t
expected number of transitions from state Si to S j
aij = T −1
=
expected number of transitions from state Si
∑γt (i)
t =1
T

∑t =1
γ t ( j)
s.t. Ot =vk expected number of times in state j and observing symbol vk
b j (k) = T
=
expected number of times in state j
∑γ ( j)
t =1
t

Baum-Welch Method
• P(O|λ) > P(O|λ)

• Iteratively use λ in place of λ and repeat
the re-estimation, we then can improve
P(O| λ) until some limiting point is
reached

An Introduction to Hidden Markov Model

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à An Introduction to Hidden Markov Model

Similaire à An Introduction to Hidden Markov Model (20)

Plus de Shih-Hsiang Lin

Plus de Shih-Hsiang Lin (13)

Dernier

Dernier (20)

An Introduction to Hidden Markov Model