2. Used to find the local similarity or alignment
shared by two sequences.
Method to find the similarity is called the
alignment. It can be of two types,
Global alignment – align the entire sequence
using as many characters as possible.
Local alignment – focuses on region of
similarity in parts of the sequence only
3.
4. Alignment of two sequences is performed by
following methods:
Dot matrix analysis
Dynamic programming
Word or k-tuple method (FASTA & BLAST
programs)
5. Align two sequences very quickly, first by
searching for identical short stretches of
sequences called word or k –tuple.
Then by joining these words into an alignment by
dynamic programming method.
BLAST and FASTA methods are heuristic.
6.
7. Basic local alignment search tool (BLAST) is a
popular user friendly tool for searching all the
major sequence databases.
It is used to find sequence homolog to predict the
identity, function, 3D structure of the query
sequence.
It shows better results for protein sequences than
nucleotide sequences.
8. Local alignment: BLAST tries to find patches of
regional similarity, rather than trying for global
fit between the query and the database sequence.
BLAST works under the assumption that high-
scoring alignments are likely to contain short
stretches of identical or near identical letters,
called words.
9. BLAST is extremely fast, the program can be run
locally or queries can be e-mailed to NCBI
server.
It does not guarantee to find the best alignment
between query and database, it may miss
matches.
Its because its strategy is expected to find most
matches, & this way it sacrifices complete
sensitivity thus to gain speed.
10. BLAST searches in two phases.
First, it looks for short subsequences that are
likely to have significant matches.
Then it tries to extend these matched regions on
both sides in order to obtain maximum sequence
similarity.
11.
12. It is a scoring method used in alignment of one
residue against other.
Margaret dayhoff and her co-workers developed
the first substitution matrix used in comparison
of protein sequences for evolutionary terms.
These matrices are commonly called as PAM
matrices.
In contrast to PAM, Steve Henikoff and his
coworkers developed BLOSUM matrices.
13. Percentacceptedmutation
matrix(PAM)BLOSUM
PAM matrices are based on global alignment of
closely related proteins.
Number accompanying PAM refers to
evolutionary distanced. Larger number represent
greater evolutionary distance.
PAM 250 is widely used.
o BLOSUM matrices are based on local alignments.
o Smaller number corresponds to greater
evolutionary distant sequences.
o BLOSUM 62 is widely used
14. Pre processing of the query:-
Quickly locate ungapped similarity between query sequence
and sequence from database.
All words of length ‘W’, of the query are compared with
database sequences.
Generation of hits:-
Hit is made with one or several successive pairs of similar
words, and characterised by its positon in each of two
sequences.
All the possible hits between query and database are
calculated
15. Extension of the hits:-
every hit is now extended, without gaps, inorder
to determine whether this hits may be part of a larger
segment of similarity.
every extended segment pair that scores the same
or better than S (set as parameter of program) is kept
and called as HSP( high scoring segment pair).
16.
17. Standard BLAST are of five types:
BLASTp
BLASTn
BLASTx
tBLASTn
tBLASTx
o Other class include:
MegaBLAST
PSI BLAST
PHI BLAST
18. BLASTp – this program compares an amino acid
query sequence against a protein sequence database.
BLASTn – it compares a nucleotide query sequence
against a nucleotide sequence database.
BLASTx – it searches the six frame translation
products of a nucleotide sequence against a protein
database.
tBLASTn – it searches a protein sequence against
translated nucleotide sequence in the database.
tBLASTx – it compares the six frame translations of
a nucleotide query sequence against six frame
translations of database.
19.
20. Mega BLAST – it is a program optimized for
aligning long sequences. It can only work with
DNA sequences.
PSI BLAST – it stands for position specific
iterated BLAST. It is useful for protein similarity
search.
PHI BLAST – pattern hit initiated BLAST, it can
be used to search for a specific pattern or motif
21. It’s a sequence analysis tool, similar to BLAST.
It was developed by W.R. Pearson and Lipman
and this algorithm can be accessed from EBI site.
Fast A gives better results for nucleotide
sequences than protein.
FastP is for protein sequences.
22. finds regions of similarity by first breaking the
sequence into short subsequences, then searching
for diagonals with highest density of words that
match.
The alignment in diagonals is then refined.
Its fast but is not guaranteed to find the best
alignment, it may miss matches.
23. First FASTA prepares a list of words from the pair of
sequences to be matched. Words can be 3-6 nucleotides
or 1or 2 amino acids.
It uses non overlapping words, it matches the words and
makes a count of it.
It creates the word diagonal and finds a high scoring
match. The output is labeled as unit1
Only if score is sizable it proceeds to the second level.
In the second level, for every best hit of words, it looks
for neighboring approximate hits
If the score value is good, and prepares a larger dot
matrix diagonal.
24. The best score from this second level scoring is
called initin,
The initin scores are saved for each comparison
of a query sequence with database sequence.
25.
26. Different programs in FASTA include
FASTP (protein sequence).
TFASTA (compares a query protein sequence to a
DNA sequence database).
FASTF( compares a set of ordered peptide fragments
obtained from analysis of protein by cleavage and
sequencing of protein bands resolved by
electrophoresis against a protein database).
TFASTF( compares a set of ordered peptide
fragments against a DNA database).