2. A protein sequence from species A
◦ What is the nearest species this protein is similar
to?
◦ Where is it originated from?
◦ Putative function.
◦ If it has a conserved motif etc.
3. Blast (Basic Local Alignment Search Tool)
◦ NCBI Blast
◦ Wu-Blast
◦ PSI-Blast
Fasta
SSearch
4. Heuristic (Educated guess)
Does not compare sequence to its entirety.
Quickly locates short matches(seeds)
Word size
Seeds are extended in both directions
Threshold is defined
◦ > Threshold -> keep the alignment
◦ < Threshold -> discard the alignment
6. A Query sequence:
◦ Nucleotide
◦ Protein
A Target Database
◦ Nucleotide
◦ Protein
Blast Program
◦ Blastn
◦ Blastp
◦ tBlastx (Slowest Nt query translated against Nt database
trlt.)
◦ tBlastn (Protein query translated nt. Database)
◦ Blastx (Nucleotide trnslt against Protein database)
7. E Value -> Probability value at which the
sequence hits may occur by chance
Score -> Similarity score.
◦ By chance rain probability is 0.001
◦ Passing by chance etc.
◦ Less the e –value the better is the sensitivity of the
alignment.
8. Remove Low Complexity regions
Generate all the k mers.
List All Possible matching key words.
- Blast cares about only high scoring pairs
- Fasta stores all pairs irrespective of the
scores.
Extend the matches into high scoring
pairs(HSPs)
Evaluate results depending on thresholds set.
Extend HSPs and join them together.
14. Substitution Matrices
Insertion and deletions are less likely than
a substitution
Insertion and Deletion in DNA sequence leads to Frame
shift.
PAM Matrices(Point Accepted Mutation Matrices)
Margaret Dayhoff 1978
PAM1 -> Expected rates of substition if 1% of the
amino acids have changed
BLOSUM : Blocks Substitution Matrix (% of identity)
15. PAM matrices are based on a
simple evolutionary model
MATLFC MLTLCC
M(A/L)TL(F/C)C Two changes
Ancestral sequence?
• Only mutations are allowed
• Sites evolve independently
15
16. Guidelines for using matricies
Protein Query LengthMatrix Open Gap Extend Gap
>300 BLOSUM50 -10 -2
85-300 BLOSUM62 -7 -1
50-85 BLOSUM80 -16 -4
>300 PAM250 -10 -2
85-300 PAM120 -16 -4
35-85 MDM40 -12 -2
<=35 MDM20 -22 -4
<=10 MDM10 -23 -4
PAM100 ==> Blosum90
PAM120 ==> Blosum80
PAM160 ==> Blosum60
PAM200 ==> Blosum52
PAM250 ==> Blosum45
17. Scoring Matrices
S = [sij] gives score of aligning character i
with character j for every pair i, j.
STPP
CTCA
0 + 3 + (-3) + 1
=1
17
Notes de l'éditeur
Series of methods that relies on pairwise alignments