1. BY
I P G B I O I N FO R M AT I C S
R . JA N A N I
1 9 P B I 0 0 3
Tertiary structure prediction
2. HMM HISTORY
HMM developed and published in 1960s and 1970s
Not widespread until late 80s
Theory published in mathematical journals.
Insufficient tutorial material for readers to understand and
apply concepts.
Andrey Andreyevich Markov was a Russian mathematician,
known for work on stochastic process.
His primary subject of research later became known as
Markov chains and Markov processes.
3. HIDDEN MARKOV MODEL
It’s a statistical analysis of sequences, especially for signal
models in which the system is being modeled is assumed to
be a Markov process with hidden states.
It states that the evolution of observable events depend on
internal factors, which are not directly observable.
It offer a mathematical description of current state of system
whose internal state is not known, only its output.
It is one among the various signal processing models and
algorithms have been used in biological sequence analysis.
It considers the real world problems structure dealing
classifying raw observations
They are sequential and cannot see the event producing the
output.
4. Observed event ‘symbol’ and invisible factor underlying
‘state’.
Consists of two stochastic process
1. Invisible process of hidden states.
2. Visible process of observable symbols.
The hidden states of markov chain and the probability
distribution of observed symbol depends on underlying
states.
It is also called doubly-embedded stochastic process.
It is well known for effectiveness in modeling the
correlations between adjacent symbols, domains or events
used in various fields.
5. Consist of finite number of set of states, an alphabet of output
symbols, a set of transition probabilities, a set of emission
probabilities.
Emission probabilities specify distribution of output symbols
that may be emitted from each state.
Two stochastic process the process of moving between states
and the process of emitting an output sequence.
Sequence of state transition is a hidden process and is
observed through the sequence of emitted symbols.
6. Two states: ’rain’ and ‘dry’
Transition probabilities: P(‘rain’/’rain’)=0.3
P(‘dry’ ’rain’)=0.7,P(‘ra’)=0.6
In dry)=0.2, P(‘dry ‘’dry’)=0.8
Initial probabilities : say P(‘rain’)=0.4, p(‘dry’)
Suppose calculate the probability of
sequence of state in our example{dry dry
rain rain}
P({‘dry ‘’dry ‘’rain ‘’rain’})=P(rain rain)P(rain
dry)p(dry dry)P(dry rain)
• =
=0.3*0.2*0.8*0.6
7. Forward-Backward procedure
Forward Algorithm -Intuition
Our goal is to determine the probability of a
sequence of observations (X1, X2, …, Xn)
given 𝜆
In the forward algorithm approach, divide
the sequence X in to sub-sequences,
compute the probabilities, store them in the
table for later use.
The probability of a larger sequence is
obtained by combining the probabilities of
these smaller sequences.
Specifically, compute the joint probability of
a sub-sequence starting from time t = 1
where the sub-sequence ends on a state y.
compute: P(X1:t, Yt| 𝜆)
then compute P(X1:n| 𝜆) by marginalizing Y
8. Forward Algorithm
Goal: Compute P(Yk, X1:k) assuming the model parameters to be
known
Approach: known emission and transition probabilities, factorize the
joint distribution P(Yk, X1:k) in terms of the known parameters and
solve. In order to implement efficiently use dynamic programming
where a large problem is solved by solving the overlapping sub-
problems and combining the solution.To do this set up the recursion.
We can write: X1:k= X1,X2…Xk-1, Xk
From sum rule we know: P(X = xi) = 𝑗𝑃(𝑋=𝑥𝑖,𝑌=𝑦𝑗)
𝑃𝑌𝑘,𝑋1:𝑘= 𝑦𝑘−1𝑚𝑃𝑌𝑘,𝑌𝑘−1,𝑋1:𝑘
𝑃𝑌𝑘,𝑋1:𝑘= 𝑦𝑘−1𝑚𝑃(𝑋1:𝑘−1,𝑌𝑘−1,𝑌𝑘,𝑋𝑘)
From product rule the above factorizes to:
𝑦𝑘−1𝑚𝑃𝑋1:𝑘−1𝑃𝑌𝑘−1𝑋1:𝑘−1𝑃(𝑌𝑘𝑌𝑘−1,𝑋1:𝑘−1)𝑃(𝑋𝑘𝑌𝑘,𝑌𝑘−1,𝑋1:𝑘−1)
= 𝑦𝑘−1𝑚𝑃𝑋1:𝑘−1𝑃𝑌𝑘−1𝑋1:𝑘−1𝑃(𝑌𝑘𝑌𝑘−1)𝑃(𝑋𝑘𝑌𝑘)
We can write: 𝛼𝑘𝑌𝑘= 𝑦𝑘−1𝑚𝑃(𝑌𝑘𝑌𝑘−1)𝑃(𝑋𝑘𝑌𝑘)𝛼𝑘−1(𝑌𝑘−1)
Initialization: 𝛼1𝑌1=𝑃𝑌1,𝑋1=𝑃(𝑌1) P(X1|Y1)
can now compute the different αvalues
9. Forward Algorithm: Implementation
defforward(self, obs):
self.fwd= [{}]
for y in self.states:
self.fwd[0][y] = self.pi[y] * self.B[y][obs[0]] # Initialize
base cases
for t in range(1, len(obs)):
self.fwd.append({})
for y in self.states:
self.fwd[t][y] = sum((self.fwd[t-1][y0] * self.A[y0][y] *
self.B[y][obs[t]]) for y0 in self.states)
prob= sum((self.fwd[len(obs) -1][s]) for s in self.states)
return prob
10. Backward Algorithm -Intuition
Our goal is to determine the probability of
a sequence of observations (Xk+1, Xk+2,
…, Xn|Yk,𝜆)
Given that the HMM has seen k
observations and ended up in a state Yk=
y, compute the probability of the
remaining part: Xk+1, Xk+2, …, Xn
Form the sub-sequences starting from the
last observation Xn and proceed
backward to the first.
Specifically, compute the conditional
probability of a sub-sequence starting
from k+1 and ending in n, where the state
at k is given.
can compute P(X1:n| 𝜆) by marginalizing
Y. The probability of an observation
sequence computed by backward
algorithm will be equal to that computed
with forward algorithm.
11. Backward Algorithm Implementation
defbackward(self, obs):
self.bwk= [{} for t in range(len(obs))]
T = len(obs)
for y in self.states:
self.bwk[T-1][y] = 1
for t in reversed(range(T-1)):
for y in self.states:
self.bwk[t][y] = sum((self.bwk[t+1][y1] * self.A[y][y1]
* self.B[y1][obs[t+1]]) for y1 in self.states)
prob= sum((self.pi[y]* self.B[y][obs[0]] *
self.bwk[0][y]) for y in self.states)
return prob
12. Viterbi Algorithm -Intuition
Our goal is to determine the most probable
state sequence for a given sequence of
observations (X1, X2, …, Xn) given 𝜆
This is a decoding process where we
discover the hidden state sequence looking
at the observations
Specifically, we need: argmaxYP(X1:t|Y1:t,
𝜆). This is equivalent to finding
argmaxYP(X1:t, Y1:t| 𝜆)
In the forward algorithm approach, we
computed the probabilities along each path
that led to the given state and summed the
probabilities to get the probability of
reaching that state regardless of the path
taken.
In Viterbi we are interested in only a
specific path that maximizes the
probability of reaching the required
state. Each state along this path (the one
that yields max probability) forms the
sequence of hidden states that are
interested in.
13. Viterbi Implementation
defviterbi(self, obs):
vit= [{}]
path = {}
for y in self.states:
vit[0][y] = self.pi[y] * self.B[y][obs[0]]
path[y] = [y]
for t in range(1, len(obs)):
vit.append({})
newpath= {}
for y in self.states:
(prob, state) = max((vit[t-1][y0] * self.A[y0][y] * self.B[y][obs[t]], y0)
for y0 in self.states)
vit[t][y] = prob
newpath[y] = path[state] + [y]
path = newpath
(prob, state) = max((vit[len(obs) -1][y], y) for y in self.states)
return (prob, path[state])
14. Application to HMMS to specific problem
Constructing genetic linkage maps.
Identifying non-coding DNA
Identifying protein-binding sites on DNA
Modelling helical caps
Protein secondary structure prediction
Protein domain classification
Problem of gene finding
Given DNA sequence the problem is to be determine the
location of genes
Input sequence of DNA X=(X1,… .Xn) £, where£=A, G, C, T
The output gives correctly labelled elements in X belonging to
coding, non-coding or inter-genic region.
Tools available Genie, GeneID, and HMMGene.
Matching of known set of DNA against a set of known genes.
15. HMM and multiple sequence alignment
HMM can be automatically create a multiple alignment from a group of unaligned sequences.
It is useful for prediction of history of evolution.
One of the major advantage of HMM can be estimated from sequence without aligning the sequence
first.
The sequence used to estimate or train the model are called training sequences.
Estimation done with the iterative forward-backward algorithm, also known as Baum-Welch
algorithm.
It maximizes the likelihood of training sequence.
Protein secondary structure prediction using HMMs
HMM is used to analyse the amino-acid sequence of proteins, studying secondary structures(helix,
sheet, and turn) and predicting the secondary structure of sequence.
The sequence contains the secondary structure whose HMM shows the highest probability
Profile-profile HMMs
HMMs built by analysing the distribution of amino acids in the training set of related proteins.
It’s a statistical model of protein family.
A state shown by diamond shaped box model insertions of random letters between two alignment
position.
A state shown by circle model deletions corresponding to gap in an alignment.
States of neighbouring portions are connected by lines.
For each line there is transition probability.
The repository of protein-profile HMMs found in PRAY database (http://www.pfam.wustl.edu) . It’s a
protein family database.
16. HMM software
HMMER
Is a package of nine programs use HMMs for sequence database search
Freely distributed.
Implementation of profile HMM method for sensitive database searches
using MSA queries.
Takes MSA inputs and build statistical model
17. SAM(sequence alignment and modelling)
Collection of flexible software tools for creating, refining,
using linear HMM for biological sequence analysis.
Model states can be viewed as representing the sequence
of columns in a MSA with arbitrary position dependent
insertions and deletions in each sequence.
Trained on a family of protein or nucleic acid sequence
using expectation-maximization algorithm and variety of
algorithmic heuristics.
18. Advantages:
Handle sequence of variable length.
Used in biological data analysis, machine learning techniques,
which requires fixed length input, such as neural network or
support vector machine
Allows position dependent gap penalties.
HMMs treat insertions and deletions in a statistical manner
that is dependent on position.
Limitations of HMMs
Linear model, unable to capture higher order correlation
among amino acids.
There is a standard machine learning problem with HMMs.
20. Structure prediction by neural network
model
Neural networks:
Also called artificial neural network are parallel distributed information
structure.
Feed forward network or multi layer perception(MLP)
Accurate
Building of the initial random net
Involves
Random selection of the type of node
Random selection of parameters of node
Random selection of number of inputs
Connecting the input and output until the net is larger
Running the training set over the net
Selecting the proper output
Removal of all nodes which do not contribute to output
23. A method computing, based on the interaction of multiple
connected processing elements
Solve many problems
The ability to learn from experience in order to improve their
performance
Ability to deal with incomplete information
Biological approach to AI
Developed in 1943
Comprised one or more layers of neuron
Several types they are feed-forward and feedback networks.
25. Classification based on learning
Supervised
Each training pattern: input+desired output
Adopt weights
After many epochs convergences to a local minimum
Unsupervised learning
No help from the outside
No training data, no information available on the desired output
Learning by doing
used to pick out structure in the input
Clustering
Reduction of dimensionality
E.g., Kohonen’s learning laws
Reinforcement learning
Teacher : training data
The teacher scores the performance of the training examples
Use performance score to shuffle weights “randomly”
Relatively slow learning due to “randomness”
26. Training the network
Alter the parameters.
Add/delete connections
Add/delete nodes with their connection
Post processing steps:
Removal of unused edges or nodes of training set
To obtain better result the nets are combined
Each training pair is of the form
Pattern **LSADQISTVASFDK
Target H
Short protein chain centred on the residue to be predicted
The estranged symbol*is used for windows that overlap the N-orC-
terminal of chain
27. 3 target classes
Helix, strands, and coil defined by collapsing the eight structural
classes given in DESPITE(Definition of Secondary Structure of
Protein)
DSSP classes prediction class
H, G helix
E strand
B, I, S, T, e, g, h coil
Both residues and target classes are encoded in unary format
Alanine 100000000000000
Helix 100
Every amino acid secondary structure type is given equal weight
age
28. Evaluating prediction efficiency
Jack-knife test
Standard way to gain unbiased measure neural network
performance is to perform Jack-knife testing or n-flod cross
validation
Tuned using training database
P times for a training set containing P protein chains, each
time the network is trained
Computational cost high
With P=1, n-flod validation identical to Jack-knife testing
29. Percentage of correctly classified residues
Popular statistical measure of performance, known as Q3.
Typical score Q3=62%
Particular measure fails to penalize the network for over prediction (non helix residues to
be helix) and understand prediction s(helix residues predicted to be non helix)
Correlation coefficient for each target class
Rigorous measure which involves calculating the correlation coefficient for each target
class
Cn= pn-ou/
√(p+sigma) (p+u) (n+sigma) (n+u)
p=pattern correctly assigned to helix
n=pattern correctly assigned to non-helix
Sigma=pattern incorrectly assigned to helix
U=pattern incorrectly assigned to non helix
The correlation coefficient for helix(Ch), strand(Ce), and coil (Cc) range from +1 totally
correlated to -1 totally anti-correlated
A measure which does take the location of predicted segment into account is the
percentage of overlapping segments.
30. Reliability index (RI)
Measure proposed by Rost and Sander, 1993.
RI of given residues is calculated using the highest and second
highest output values
Integer[(net highest output-net next highest output) ×10]
Drawbacks
Prediction are based on limited local context.
Non local factors not taken into account
Predictions based on limited amount of biological information
Principles underlying protein structure not considered
The predictions are uncorrelated
Predictions based on performance of single network, with inherent
bias/noise.
31. Applications
Pattern recognition
Investment analysis
Control systems and monitoring
Mobile computing
Marketing and financial applications
Forecasting-sales, market research, Meteorology
32. Advantages
Perform task that linear program cannot do
When an element of neural network fails, it can continue
without problem
It learns and does not need to be reprogrammed
It can be implemented in any applications
It can be implemented without any problem
Disadvantages
Needs training to operate
Architecture is different from microprocessors
Requires high process time for large network