This document provides an introduction to hidden Markov models (HMMs). It defines HMMs as an extension of Markov models that allows for observations that are probabilistic functions of hidden states. The core problems of HMMs are finding the probability of an observed sequence and determining the most probable hidden state sequence that produced an observation. HMMs have applications in areas like speech recognition by finding the most likely string of words given acoustic input using the Viterbi and forward algorithms.
2. Agenda
Introduction
Markov Model
Hidden Markov Model
Problems in HMM
Applications
HMM in speech recognition
References
3. Introduction
Stochastic process (random process) :
System that changes over time in an uncertain manner.
Is a collection of random variables, representing the
evolution of some system of random values over time. This
is the probabilistic counterpart to a deterministic process.
Instead of describing a process which can only evolve in
one way, in a stochastic or random process there is some
indeterminacy: even if the initial condition (or starting
point) is known, there are several (often infinitely many)
directions in which the process may evolve.
6. Techniques to model the Stochastic process
Branching process
Gaussian process
Hidden Markov model
Markov process
Introduction (Cont.)
7. Introduction (Cont.)
In 1906, Andrey Markov introduced
the Markov chains.
He produced the first theoretical
results for stochastic processes by
using the term “chain” for the first
time.
It is required to possess a property that
is usually characterized as
"memoryless" : the probability
distribution of the next state depends
only on the current state and not on
the sequence of events that preceded
it. (also called Markov Property)
9. Markov Model
Markov Model
Is a stochastic model used to
model randomly changing
systems where it is assumed
that future states depend only
on the present state and not on
the sequence of events that
preceded it
10. Markov Model (Cont.)
Example 1 :
Let’s talk about the weather, Here in Cairo we assume that we have three
types of weather sunny, rainy, and cloudy. Let’s assume for the moment
the weather lasts all day, i.e. it doesn’t change from rainy to sunny in that
the middle of the day.
By carefully examining the weather for a long time, we found following
weather change pattern.
12. Question :
What is the probability that
the weather for the next 6
days will be “cloudy-rainy-
rainy-sunny-cloudy-sunny”
when today is sunny given
our weather Markov model ?
Markov Model (Cont.)
13. Definitions :
Observable states :
Observed sequence :
State transition matrix :
Markov Model (Cont.)
},,,{ 21 Tqqq
},,2,1{ N
14. Definitions :
Initial state probability :
Markov assumption ( Markov Property ) :
Markov Model (Cont.)
15. Definitions :
Sequence probability of Markov model :
Markov Model (Cont.)
Remember Markov assumption
???)sunny-cloudy-sunny-rainy-rainy-cloudy-sunny( P
16. The answer :
O = {“cloudy-rainy-rainy-sunny-cloudy-sunny”}.
when today is sunny
Assume that S1 : rainy ,
S2 : cloudy,
S3 : sunny.
P(O | model ) = P(sunny-cloudy-rainy-rainy-sunny-cloudy-sunny |model)
= P(S3, S2, S1, S1, S3, S2, S3 | model )
= P(S3) . P(S2|S3) . P(S1|S2) . P(S1|S1)
P(S3|S1) . P(S2|S3) . P(S3|S2).
= 1 . (0.1) . (0.3) . (0.4) . (0.3) . (0.1) . (0.2)
= 0.00007
Markov Model (Cont.)
18. So far we have considered
Markov models in which each
state corresponded to an
observable (physical) event.
This model is too restrictive to
be applicable to many problems
of interest, so we extend the
concept of Markov models to
include the case where the
observation is a probabilistic
function of the state.
Hidden Markov Model
The adjective 'hidden' refers to the state sequence through which
the model passes, not to the parameters of the model.
19. Notation :
(1) N: Number of states.
(2) M: Number of symbols observable
in states.
(3) A: State transition probability
distribution
(4) B: Observation symbol
probability distribution
(5) Initial state distribution
Hidden Markov Model
),,( BA
21. Problem 1 :
Finding the probability of an observed sequence.
What is the ???
Solution :
Sum over all possible paths of the state sequence
that generate the given observation sequence,
using forward algorithm .
HMM Core Problems (cont.)
)|P(O
23. Example :
What is the probability of the
sequence of observation :
O = {shopping, cleaning, walking, cleaning}
given that HMM model ?
HMM Core Problems (cont.)
24. Solution :
HMM Core Problems (cont.)
R
S
R
S
R
S
R
S
shopping cleaning walking cleaning
Day 1 Day 2 Day 3 Day 4
24.0)1(1
12.0)2(1
Step 1 Step 2 (repeat step 2 to the end)
108.0)1(2
0114.0)2(2
008.0)1(3
386.0)2(3
08.0)1(4
023.0)2(4
12.03.0*4.0)2(1
24.04.0*6.0)1(1
25. Solution :
HMM Core Problems (cont.)
R
S
R
S
R
S
R
S
shopping cleaning walking cleaning
Day 1 Day 2 Day 3 Day 4
24.0)1(1
12.0)2(1
108.0)1(2
0114.0)2(2
008.0)1(3
386.0)2(3
08.0)1(4
023.0)2(4
Step 3
26. Problem 2 :
Given observation, what is the most probable
transition sequence ?
Solution :
We can find the most probable transition sequence
using Viterbi Algorithm.
HMM Core Problems (cont.)
27. Example :
Given sequence of observation :
O = {shopping, cleaning, walking, cleaning}
what is the most probable
transition sequence of hidden
states ?
HMM Core Problems (cont.)
28. Solution :
HMM Core Problems (cont.)
R
S
R
S
R
S
shopping walking cleaning
Day 1 Day 3 Day 4
24.0)1(1
12.0)2(1
Step 1
12.03.0*4.0)2(1
24.04.0*6.0)1(1
R
S
cleaning
Day 2
29. Solution :
HMM Core Problems (cont.)
R
S
R
S
R
S
R
S
shopping cleaning walking cleaning
Day 1 Day 2 Day 3 Day 4
24.0)1(1
12.0)2(1
Step 2
084.0)2(2
072.0)2(2
024.0)2(1*12*)2(1)(
084.0)2(1*11*)1(1)(
ObaRP
ObaRP
0.084
0.0072
30. Solution :
HMM Core Problems (cont.)
R
S
R
S
walking cleaning
Day 1 Day 2 Day 3 Day 4
Step 2
018.0)1(3
0194.0)2(3
023.0)1(4
077.0)2(4
0.001
0.0041
0.002
0.0002
R
S
R
S
shopping cleaning
Day 1 Day 2
24.0)1(1
12.0)2(1
084.0)2(2
072.0)2(2
0.084
0.0072
31. Solution :
HMM Core Problems (cont.)
R
S
R
S
R
S
R
S
shopping cleaning walking cleaning
Day 1 Day 2 Day 3 Day 4
24.0)1(1
12.0)2(1
084.0)2(2 018.0)1(3
0194.0)2(3
023.0)1(4
077.0)2(4
0.084
0.0072
072.0)2(2
0.001
0.0041
0.002
0.0002
32. Applications
Speech recognition
• Recognizing spoken words and phrases
Text processing
• Parsing raw records into structured records
Bioinformatics
• Protein sequence prediction
Financial
• Stock market forecasts (price pattern prediction)
• Comparison shopping services
33. HMM in speech recognition
The basic idea is to find the most likely string of
words given some acoustic (voiced) input.
34. HMM in speech recognition (Cont.)
The units (levels) of speech recognition systems