SlideShare une entreprise Scribd logo
1  sur  54
Sequence Alignment
Presentation On
Zeeshan Akram Hanjra
15211506-050
Bot-309 Genetics-I
Bs Botany 6th (A)
University of Gujrat
● Procedure of comparing two or more sequences by
searching for a series of individual characters or
character patterns that are in the same order in the
sequences.
● Two sequences are aligned by writing them across a
page in two rows.
● Identical or similar characters are placed in the same
column, and non-identical characters can either be
placed in same column as mismatch or opposite a
gap in the other sequence.
 What is sequence alignment ?
Alignment
Alignment is the task of locating “equivalent”
regions of two or more sequences to maximize
their similarity.
● NIKESH NARAYANAN
● NIGESH NARAYAN- -
(RED : Mismatches)
( gaps )
● Way of arranging the sequences of DNA, RNA or
protein to identify regions of similarity.
● Helps in inferring functional , Structural or
evolutionary relationship between the sequence.
● Sequence alignment methods are used to find the
best- matching sequences.
● To determine the nucleotidessequence of DNA.
( Adenine, Thymine, cytosine, Guanine )
 Why ?
An algorithm is a sequence of instructions that one
must perform in order to solve a well-formulated
problem.
First you must identify exactly what the problem is.
A problem describes a class of computational tasks. A
problem for instance is one particular input from that
task.
Algorithms
 An algorithm must stop after a finite number of steps.
 All steps must be precisely defined.
 Input to the algorithm must be specified.
 Output to the algorithm must be specified.
 It must be very effective.
Features of Algorithm
● A genetic algorithm is used in artificial intelligence
and computing. It is used for finding optimized
solutions to search problems based on the theory of
natural selection and evolutionary biology.
Genetic Algorithm
 Global alignment
Attempts to align the entire sequence using as
many characters as possible, up to both ends of
each sequence.
Sequences that are quite similar and approximately
the same length are suitable candidates for global
alignment.
Needleman - Wunch algorithm is used to produce
global alignment between pairs of DNA or Protein
sequences.
Types Of Alignment
Local alignment
● Stretches of sequence with the highest density of
matches are aligned.
● Generates one or more islands of matches or sub
alignments in the aligned sequences.
● Suitable for aligning sequences that are similar along
some of their lengths but dissimilar in others,
sequences that differ in length, or sequences that
share conserved region or domain.
● Smith-Waterman algorithm is used to produce local
alignments.
Fig: Distinction between Global and Local alignment of twosequences
 Function or activity of a new gene/protein.
 Structure or shape of a new protein.
 Location or preferred location of a protein.
 Stability of a gene or protein.
 Origin of a gene or protein.
 Origin or phylogeny of an organelle.
 Origin or phylogeny of an organism.
Goals/Importance Of Alignment
● Parametric sequence refers to computer
methods that are used to find a range of
possible alignments.
● In response to varying the scoring system used
for matches, miss-matches , and gaps.
● There is also an effort to use scores.
● The result of global and local types of sequence
alignments provide consistent result.
Parametric sequence
● The process of alignment can be measured in
terms of the number of gaps introduced and the
number of mismatches remaining in the
alignment.
● We could score the alignment by counting how
many positions match identically at each
position.
● Many gaps may have to be placed at positions
that are not strictly identical.
Gaps
● In such cases, the positioning of gaps in the
alignment becomes numerous and more
complex.
● If this is done. The algorithms produce
alignments containing very large proportions of
matching letters and large numbers of gaps.
Cont.
● Although this process achieves optimum score
and is mathematically meaningful.
● The result of such a process would be
biologically meaningless, because insertion and
deletion of monomers is relatively a slow
evolutionary process.
Mismatches
● Dynamic programming algorithms use gap
penalties to maximize the biological meaning.
● A simple score contains a positive additives
contribution of 1 for every matching pair of
letters in the alignment.
● A gap penalty is subtracted for each gap that
has been introduced (different kinds of gap
penalties are there such as constant penalty,
proportional penalty, gap penalty which includes
gap opening and gap extension penalty).
Cont.
● The total alignment score is then a function of
the identity between aligned residues and gap
penalities incurred.
Cont.
● Distance treat sequence as points in metric space.
● A function ,associated a numeric value with a pair
of sequence.
● Larger the distance ,smaller the similarities and
vice versa.
● It satisfied the mathematical axioms of a metric.
● Distance and similarities are interchangeable.
Distance measure
● Can be measured in term of number of gaps
introduced and number of mismatches
remaining.
● It also known as edit distance.
● It is a minimum number of edit operations
required to change one string to the other.
● Edit operation can be addition , deletion
,insertion or alternation of single character.
Lavenshtein distance
● Distance between two sequences of equal
length is the number of positions with
mismatches character
● It is desirable to assign variable weights to
different edit operation since certain changes
are more likely to occur naturally.
Hamming distance
● These are used to signify text in perl programming
language.
● Usually surrounding by single or double quotation
marks.
● Given two character string, hamming technique is
used to measure distance between them.
 Strings
● Amino acid substitution tend to be conservative
and the replacement of one amino acid by
another with similar size.
● Physiochemical properties is more likely to occur
then its replacement by another amino acid
with very different property.
● Algorithm used different distance measure to
compute and score alignments.
High scoring matches
● Similar sequence gives high score
● high scoring have only mathematical significance
● While the dissimilar sequence gives the low score
● Algorithm for optimal alignment can seek either
to minimize a dissimilarity measure or maximized
a coring function
High scoring ,low scoring
● It generally involves full length sequence and a
comprehensive alignment require that many
residue have to be placed at positions that are
not strictly identical.
● For a biologically meaningful comparison, the
positioning of gaps and the number of identical
mismatches have to be balanced.
Sequence comparison
● To achieved the optimum score penalties are
introduce to minimized the number of gaps and
extensions penalties are added when the gap is
extended.
● The important task of sequence scoring is to
distinguish between the high scoring and low
scoring.
Optimum score
It is useful to discover
● Structural ,functional and evolutionary
information.
Sequences that are similar
● have same function.
● Regulatory role in case of similar DNA molecule
● Similar biochemical function and 3-D structure
for proteins.
Uses of sequence alignment
It is important to obtain
● Best possible or optimal alignment.
If 2 sequences from 2 different organisms are
similar
● There have been a common ancestor sequence
● Sequence said to be homologous.
Uses
● Alignment indicates
● Changes that have occurred between two
homologous sequences and a common
ancestor sequence.
● Helps to determine the data base
● that are potentially related to a particular
sequence.
Uses Cont.
2 scientists Doolittle and Waterfield discovered
similar sequences for first time.
● They found that viral oncogene V-sis was found
to be a modified form of normal cellular gene
which encodes platelet-derived growth factor.
● Dynamic programming algorithms find best
alignment.
● Process is very slow.
Uses
Due to random mutations nucleotides may be
 Replaced
 Deleted
 Or inserted.
Loss of function of protein is disadvantage of the
organism.
Change will survive if its not a deleterious effect on
protein.
Scoring mutations, Deletions and Substitution
• If change is deleterious than organism will not
survive and the genes will not transfer.
• Most of substitution mutations are well tolerated in
protein.
The substitution that does not affect protein property
is called as conservative substitution.
• Protein coding genes evolve much slowly.
• When evolution happens the proteins tend to
involve substitution between amino acids with
similar proteins.
Cont.
Protein sequences from same evolutionary family
show
 substitution between amino acids with similar
physiochemical processes.
 Substitution score matrix used to show scores
for amino acid substitutions.
 While comparing proteins we can increase
sensitivity to weak alignments by substitution
matrix.
Cont.
● In different species amino acid substitutions
occur in proteins that functions and are
compatible with its structure and function.
● They are chemically similar but changes also
occur.
● By knowing the changes in proteins can assist in
predicting alignments.
● If protein sequences are similar they are easily
aligned.
Amino acid substitution matrix
 Evolution can be predicted if ancestral relationships
among a group of proteins are assessed.
 Margaret Dayhoff pioneered this analysis.
 Symbol comparison table are used for this purposes.
Mechanism:
 Matrices amino acids are listed above and below.
 Each matrix position is filled with a score.
 It shows how often an amino acid is paired with other.
Cont.
● Probability of changing an amino acid from A to
B assumed to be possible of the reverse.
● This is because the ancestor amino acid in
phylogenetic tree is not known.
● The prediction of this model is that over
evolutionary time amino acid frequencies will
not change.
● Calculating alignment scores identical amino
acids should be given higher value.
Cont.
• And among substitutions conservative substitutions
should be given greater value than non conservative
substitutions.
• Tow popular matrices
Dayhoff mutation data
BLOSUM
They have been devised to weight matches between
non identical residues.
• MD score is based on concept of point accepted
mutation.
Cont.
● A PAM matrix is a matrix where each column and row
represents one of the twenty standard amino acids.
● In bioinformatics, PAM matrices are regularly used as
substitution matrices to score sequence alignments for
proteins.
● The missense mutations may be classed as point accepted
mutations
● A PAM matrix is a matrix where each column and row
represents one of the twenty standard amino acids.
Percent Accepted mutation matrix
● The genetic instructions of every replicating cell in a
living organism are contained within its DNA.
● Throughout the cell's lifetime, this information is
transcribed and replicated by cellular mechanisms.
● To produce proteins or to provide instructions for
daughter cells during cell division, and the possibility
exists that the DNA may be altered during these
processes. This is known as a mutation.
● At the molecular level, there are regulatory systems
that correct most but not all of these changes to the
DNA before it is replicated.
 Biological background
● PAM matrices were introduced by Margaret Dalhoff
in 1978.
● The calculation of these matrices were based on
1572 observed mutations in the phylogenetic trees
of 71 families of closely related proteins.
● The proteins to be studied were selected on the
basis of having high similarity with their
predecessors.
● The protein alignments included were required to
display at least 85% identity.
Construction of PAM matrices
As a result, it is reasonable to assume that any aligned
mismatches were the result of a single mutation event,
rather than several at the same location.
● Each PAM matrix has twenty rows and twenty
columns one representing each of the twenty amino
acids translated by the genetic code
● The value in each cell of a PAM matrix is related to
the probability of a row amino acid before the
mutation being aligned with a column amino acid
afterwards.
Conti.
● For each branch in the phylogenetic trees of the protein
families, the number of mismatches that were observed were
recorded and a record kept of the two amino acids involved.
● These counts were used as entries below the main diagonal of
the matrix A, Matrix A is assumed to be symmetrical.
● The mutability of an amino acid is the ratio of the number
of mutations the number of times it occurs in an alignment.
● Cysteine and tryptophan were found to be the least mutable
amino acids.
● Cysteine's side chain contains sulfur which participates in
disulfide bonds.
 Collection of data from phylogenetic tree
● Relative mutabilities were evaluated by counting in each
group of related sequences, the number of changes of
each amino acid and dividing this number by a factor,
called the exposure to mutation of the amino acid.
● This factor is the product of the frequency of occurrence
of all amino acid changes that occurred in that group per
100 sites.
● By these scores,Asn,Ser,Asp and Glu were the most
mutable amino acids, and Cys and Trp were the least
mutable
 Relative mutabilities
● The molecular clock hypothesis predicts that the rate
of amino acid substitution in a particular protein will
be approximately constant over time, though this rate
may vary between protein families.
● This suggests that the number of mutations per amino
acid in a protein increases approximately linearly with
time.
● Determining the time at which two proteins diverged
is an important task in phylogenetic.
 Determining the time of divergence in
phylogenetic trees
● Fossil records are often used to establish the
position of events on the timeline of the Earth's
evolutionary history, but the application of this
source is limited.
● However, if the rate at which the molecular clock of
protein family ticks that is, the rate at which the
number of mutations per amino acid increases is
known.
● Then knowing this number of mutations would
allow the date of divergence to be found.
Cont.
● PAM matrices are usually converted into
another form, called as log odds matrices.
● The odds score represents the ratio of the
change of amino acid substitution by two
different Hypothesis.
● One that the change actually represents an
authentic evolutionary variation at that site.
Log odds matrices
• PAM matrices are also used as a scoring matrix
when comparing DNA sequences or protein
sequences to judge the quality of the alignment.
• This form of scoring system is utilized by a wide
range of alignment software including BLAST.
Use in BLAST
● The BLOSUM substitution is widely used for
scoring protein sequence alignment.
● The BLOSUM matrices are based on different
types of sequence analysis and much larger set
than the PAM matrices.
What is BLOSUM?
● The matrices values are based on the observed
amino acid substitution in a large set more than
2000 conserved amino acid pattern called block.
● These blocks have been found in a database of
protein sequence representing more than 500
families of related protein and act as signature
of these protein families.
What is BLOCK?
● The prosites catalog provides lists of protein that
in the same family because they have similar
biochemical function.
● For each family a pattern of amino acids that are
characteristic of that function is provided
● Henikoff examined each prosite family for the
presence of ungapped amino acid pattern blocks
that could be used to identify members of that
family.
Cont.
● To locate these patterns the sequence of each
protein family were searched for similar amino acid
pattern by the MOTIF program.
● These initial pattern are organized into larger
ungapped pattern(blocks) between 3 and 60 amino
acid long by the Henikoffs PROTOMAT program.
● These blocks are present in all the sequence in each
family.
● They could be used to identify other members of
family.
How to locate patterns?
● The blocks that are characterized each family
provided a type of multiple sequence alignment for
that family.
● The amino acid changes in column of alignment
could be counted.
● The types of substitution were used to prepare a
scoring matrix,BLOSUM matrix.
● These were given as logarithm of odd scores of ratio
of observed frequency of amino acid divided by
frequency expected by chance.
Cont.
● The counting of amino acids changes in blocks.
● The sequence were grouped together into one
substitution before scoring the amino acid sub.
In aligned block.
● Pattern that were 60% identical were grouped
together to make one substitution called
BLOSUM60.
● And those 80% alike called BLOSUM80.
How to count amino acids?
● Like PAM BLOSUM is based on similar principles
of target frequencies of mutation.
● BLOSUM make use of BLOCK database.
● Blocks contain local multiple alignments of
distantly related sequence.
● BLOSUM has an evolutionary model in its matrix
formation as seen in PAM.
Similarity between BLOSUM & PAM.
Thanks

Contenu connexe

Tendances

Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Vijay Hemmadi
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentRamya S
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. localbenazeer fathima
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fastaALLIENU
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicshemantbreeder
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsprateek kumar
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure predictionkaramveer prajapat
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsNikesh Narayanan
 
Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)SumatiHajela
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure predictionMuhammed sadiq
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Pritom Chaki
 

Tendances (20)

Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Scop database
Scop databaseScop database
Scop database
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Finding ORF
Finding ORFFinding ORF
Finding ORF
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure prediction
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 

Similaire à Sequence alignment

Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptxArupKhakhlari1
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
 
Phylogenetic analysis in nutshell
Phylogenetic analysis in nutshellPhylogenetic analysis in nutshell
Phylogenetic analysis in nutshellAvinash Kumar
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)AnkitTiwari354
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentPRUTHVIRAJ K
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisSangeeta Das
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignmentharshita agarwal
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentRavi Gandham
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02PILLAI ASWATHY VISWANATH
 
Microarray and its application
Microarray and its applicationMicroarray and its application
Microarray and its applicationprateek kumar
 
Sequence Alignment.pptx
Sequence Alignment.pptxSequence Alignment.pptx
Sequence Alignment.pptxNareshButani2
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 

Similaire à Sequence alignment (20)

Parwati sihag
Parwati sihagParwati sihag
Parwati sihag
 
Sequence alignment.pptx
Sequence alignment.pptxSequence alignment.pptx
Sequence alignment.pptx
 
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Phylogenetic analysis in nutshell
Phylogenetic analysis in nutshellPhylogenetic analysis in nutshell
Phylogenetic analysis in nutshell
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence Analysis
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignment
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Microarray and its application
Microarray and its applicationMicroarray and its application
Microarray and its application
 
Sequence Alignment.pptx
Sequence Alignment.pptxSequence Alignment.pptx
Sequence Alignment.pptx
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 

Plus de Zeeshan Hanjra

Identification of mushrooms based on macroscopic and microscopic
Identification of mushrooms based  on macroscopic and microscopicIdentification of mushrooms based  on macroscopic and microscopic
Identification of mushrooms based on macroscopic and microscopicZeeshan Hanjra
 
Role of vitamins in human growth
Role of vitamins in human growthRole of vitamins in human growth
Role of vitamins in human growthZeeshan Hanjra
 
Exogenous effects of salicylic acid on marigold
Exogenous effects of salicylic acid on marigoldExogenous effects of salicylic acid on marigold
Exogenous effects of salicylic acid on marigoldZeeshan Hanjra
 
Heat stress, its tolerance and mechanism in cereals
Heat stress, its tolerance and mechanism in cerealsHeat stress, its tolerance and mechanism in cereals
Heat stress, its tolerance and mechanism in cerealsZeeshan Hanjra
 
Regulation of defense flavonoid formation in legumes
Regulation of  defense flavonoid formation in legumesRegulation of  defense flavonoid formation in legumes
Regulation of defense flavonoid formation in legumesZeeshan Hanjra
 
Antiasthamatic &anti inflamatory potential of solvent extracts from curcu...
Antiasthamatic &anti inflamatory potential of solvent extracts from curcu...Antiasthamatic &anti inflamatory potential of solvent extracts from curcu...
Antiasthamatic &anti inflamatory potential of solvent extracts from curcu...Zeeshan Hanjra
 
Recent trends in bioinformatics
Recent trends in bioinformaticsRecent trends in bioinformatics
Recent trends in bioinformaticsZeeshan Hanjra
 

Plus de Zeeshan Hanjra (10)

Identification of mushrooms based on macroscopic and microscopic
Identification of mushrooms based  on macroscopic and microscopicIdentification of mushrooms based  on macroscopic and microscopic
Identification of mushrooms based on macroscopic and microscopic
 
Role of vitamins in human growth
Role of vitamins in human growthRole of vitamins in human growth
Role of vitamins in human growth
 
Steroid alkaloids
Steroid alkaloidsSteroid alkaloids
Steroid alkaloids
 
Exogenous effects of salicylic acid on marigold
Exogenous effects of salicylic acid on marigoldExogenous effects of salicylic acid on marigold
Exogenous effects of salicylic acid on marigold
 
Estuaries
EstuariesEstuaries
Estuaries
 
Heat stress, its tolerance and mechanism in cereals
Heat stress, its tolerance and mechanism in cerealsHeat stress, its tolerance and mechanism in cereals
Heat stress, its tolerance and mechanism in cereals
 
Regulation of defense flavonoid formation in legumes
Regulation of  defense flavonoid formation in legumesRegulation of  defense flavonoid formation in legumes
Regulation of defense flavonoid formation in legumes
 
Antiasthamatic &anti inflamatory potential of solvent extracts from curcu...
Antiasthamatic &anti inflamatory potential of solvent extracts from curcu...Antiasthamatic &anti inflamatory potential of solvent extracts from curcu...
Antiasthamatic &anti inflamatory potential of solvent extracts from curcu...
 
Recent trends in bioinformatics
Recent trends in bioinformaticsRecent trends in bioinformatics
Recent trends in bioinformatics
 
Taxonomic evidences
Taxonomic evidencesTaxonomic evidences
Taxonomic evidences
 

Dernier

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 

Dernier (20)

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 

Sequence alignment

  • 1. Sequence Alignment Presentation On Zeeshan Akram Hanjra 15211506-050 Bot-309 Genetics-I Bs Botany 6th (A) University of Gujrat
  • 2. ● Procedure of comparing two or more sequences by searching for a series of individual characters or character patterns that are in the same order in the sequences. ● Two sequences are aligned by writing them across a page in two rows. ● Identical or similar characters are placed in the same column, and non-identical characters can either be placed in same column as mismatch or opposite a gap in the other sequence.  What is sequence alignment ?
  • 3. Alignment Alignment is the task of locating “equivalent” regions of two or more sequences to maximize their similarity. ● NIKESH NARAYANAN ● NIGESH NARAYAN- - (RED : Mismatches) ( gaps )
  • 4. ● Way of arranging the sequences of DNA, RNA or protein to identify regions of similarity. ● Helps in inferring functional , Structural or evolutionary relationship between the sequence. ● Sequence alignment methods are used to find the best- matching sequences. ● To determine the nucleotidessequence of DNA. ( Adenine, Thymine, cytosine, Guanine )  Why ?
  • 5. An algorithm is a sequence of instructions that one must perform in order to solve a well-formulated problem. First you must identify exactly what the problem is. A problem describes a class of computational tasks. A problem for instance is one particular input from that task. Algorithms
  • 6.  An algorithm must stop after a finite number of steps.  All steps must be precisely defined.  Input to the algorithm must be specified.  Output to the algorithm must be specified.  It must be very effective. Features of Algorithm
  • 7. ● A genetic algorithm is used in artificial intelligence and computing. It is used for finding optimized solutions to search problems based on the theory of natural selection and evolutionary biology. Genetic Algorithm
  • 8.  Global alignment Attempts to align the entire sequence using as many characters as possible, up to both ends of each sequence. Sequences that are quite similar and approximately the same length are suitable candidates for global alignment. Needleman - Wunch algorithm is used to produce global alignment between pairs of DNA or Protein sequences. Types Of Alignment
  • 9. Local alignment ● Stretches of sequence with the highest density of matches are aligned. ● Generates one or more islands of matches or sub alignments in the aligned sequences. ● Suitable for aligning sequences that are similar along some of their lengths but dissimilar in others, sequences that differ in length, or sequences that share conserved region or domain. ● Smith-Waterman algorithm is used to produce local alignments.
  • 10. Fig: Distinction between Global and Local alignment of twosequences
  • 11.  Function or activity of a new gene/protein.  Structure or shape of a new protein.  Location or preferred location of a protein.  Stability of a gene or protein.  Origin of a gene or protein.  Origin or phylogeny of an organelle.  Origin or phylogeny of an organism. Goals/Importance Of Alignment
  • 12. ● Parametric sequence refers to computer methods that are used to find a range of possible alignments. ● In response to varying the scoring system used for matches, miss-matches , and gaps. ● There is also an effort to use scores. ● The result of global and local types of sequence alignments provide consistent result. Parametric sequence
  • 13. ● The process of alignment can be measured in terms of the number of gaps introduced and the number of mismatches remaining in the alignment. ● We could score the alignment by counting how many positions match identically at each position. ● Many gaps may have to be placed at positions that are not strictly identical. Gaps
  • 14. ● In such cases, the positioning of gaps in the alignment becomes numerous and more complex. ● If this is done. The algorithms produce alignments containing very large proportions of matching letters and large numbers of gaps. Cont.
  • 15. ● Although this process achieves optimum score and is mathematically meaningful. ● The result of such a process would be biologically meaningless, because insertion and deletion of monomers is relatively a slow evolutionary process. Mismatches
  • 16. ● Dynamic programming algorithms use gap penalties to maximize the biological meaning. ● A simple score contains a positive additives contribution of 1 for every matching pair of letters in the alignment. ● A gap penalty is subtracted for each gap that has been introduced (different kinds of gap penalties are there such as constant penalty, proportional penalty, gap penalty which includes gap opening and gap extension penalty). Cont.
  • 17. ● The total alignment score is then a function of the identity between aligned residues and gap penalities incurred. Cont.
  • 18. ● Distance treat sequence as points in metric space. ● A function ,associated a numeric value with a pair of sequence. ● Larger the distance ,smaller the similarities and vice versa. ● It satisfied the mathematical axioms of a metric. ● Distance and similarities are interchangeable. Distance measure
  • 19. ● Can be measured in term of number of gaps introduced and number of mismatches remaining. ● It also known as edit distance. ● It is a minimum number of edit operations required to change one string to the other. ● Edit operation can be addition , deletion ,insertion or alternation of single character. Lavenshtein distance
  • 20. ● Distance between two sequences of equal length is the number of positions with mismatches character ● It is desirable to assign variable weights to different edit operation since certain changes are more likely to occur naturally. Hamming distance
  • 21. ● These are used to signify text in perl programming language. ● Usually surrounding by single or double quotation marks. ● Given two character string, hamming technique is used to measure distance between them.  Strings
  • 22. ● Amino acid substitution tend to be conservative and the replacement of one amino acid by another with similar size. ● Physiochemical properties is more likely to occur then its replacement by another amino acid with very different property. ● Algorithm used different distance measure to compute and score alignments. High scoring matches
  • 23. ● Similar sequence gives high score ● high scoring have only mathematical significance ● While the dissimilar sequence gives the low score ● Algorithm for optimal alignment can seek either to minimize a dissimilarity measure or maximized a coring function High scoring ,low scoring
  • 24. ● It generally involves full length sequence and a comprehensive alignment require that many residue have to be placed at positions that are not strictly identical. ● For a biologically meaningful comparison, the positioning of gaps and the number of identical mismatches have to be balanced. Sequence comparison
  • 25. ● To achieved the optimum score penalties are introduce to minimized the number of gaps and extensions penalties are added when the gap is extended. ● The important task of sequence scoring is to distinguish between the high scoring and low scoring. Optimum score
  • 26. It is useful to discover ● Structural ,functional and evolutionary information. Sequences that are similar ● have same function. ● Regulatory role in case of similar DNA molecule ● Similar biochemical function and 3-D structure for proteins. Uses of sequence alignment
  • 27. It is important to obtain ● Best possible or optimal alignment. If 2 sequences from 2 different organisms are similar ● There have been a common ancestor sequence ● Sequence said to be homologous. Uses
  • 28. ● Alignment indicates ● Changes that have occurred between two homologous sequences and a common ancestor sequence. ● Helps to determine the data base ● that are potentially related to a particular sequence. Uses Cont.
  • 29. 2 scientists Doolittle and Waterfield discovered similar sequences for first time. ● They found that viral oncogene V-sis was found to be a modified form of normal cellular gene which encodes platelet-derived growth factor. ● Dynamic programming algorithms find best alignment. ● Process is very slow. Uses
  • 30. Due to random mutations nucleotides may be  Replaced  Deleted  Or inserted. Loss of function of protein is disadvantage of the organism. Change will survive if its not a deleterious effect on protein. Scoring mutations, Deletions and Substitution
  • 31. • If change is deleterious than organism will not survive and the genes will not transfer. • Most of substitution mutations are well tolerated in protein. The substitution that does not affect protein property is called as conservative substitution. • Protein coding genes evolve much slowly. • When evolution happens the proteins tend to involve substitution between amino acids with similar proteins. Cont.
  • 32. Protein sequences from same evolutionary family show  substitution between amino acids with similar physiochemical processes.  Substitution score matrix used to show scores for amino acid substitutions.  While comparing proteins we can increase sensitivity to weak alignments by substitution matrix. Cont.
  • 33. ● In different species amino acid substitutions occur in proteins that functions and are compatible with its structure and function. ● They are chemically similar but changes also occur. ● By knowing the changes in proteins can assist in predicting alignments. ● If protein sequences are similar they are easily aligned. Amino acid substitution matrix
  • 34.  Evolution can be predicted if ancestral relationships among a group of proteins are assessed.  Margaret Dayhoff pioneered this analysis.  Symbol comparison table are used for this purposes. Mechanism:  Matrices amino acids are listed above and below.  Each matrix position is filled with a score.  It shows how often an amino acid is paired with other. Cont.
  • 35. ● Probability of changing an amino acid from A to B assumed to be possible of the reverse. ● This is because the ancestor amino acid in phylogenetic tree is not known. ● The prediction of this model is that over evolutionary time amino acid frequencies will not change. ● Calculating alignment scores identical amino acids should be given higher value. Cont.
  • 36. • And among substitutions conservative substitutions should be given greater value than non conservative substitutions. • Tow popular matrices Dayhoff mutation data BLOSUM They have been devised to weight matches between non identical residues. • MD score is based on concept of point accepted mutation. Cont.
  • 37. ● A PAM matrix is a matrix where each column and row represents one of the twenty standard amino acids. ● In bioinformatics, PAM matrices are regularly used as substitution matrices to score sequence alignments for proteins. ● The missense mutations may be classed as point accepted mutations ● A PAM matrix is a matrix where each column and row represents one of the twenty standard amino acids. Percent Accepted mutation matrix
  • 38. ● The genetic instructions of every replicating cell in a living organism are contained within its DNA. ● Throughout the cell's lifetime, this information is transcribed and replicated by cellular mechanisms. ● To produce proteins or to provide instructions for daughter cells during cell division, and the possibility exists that the DNA may be altered during these processes. This is known as a mutation. ● At the molecular level, there are regulatory systems that correct most but not all of these changes to the DNA before it is replicated.  Biological background
  • 39. ● PAM matrices were introduced by Margaret Dalhoff in 1978. ● The calculation of these matrices were based on 1572 observed mutations in the phylogenetic trees of 71 families of closely related proteins. ● The proteins to be studied were selected on the basis of having high similarity with their predecessors. ● The protein alignments included were required to display at least 85% identity. Construction of PAM matrices
  • 40. As a result, it is reasonable to assume that any aligned mismatches were the result of a single mutation event, rather than several at the same location. ● Each PAM matrix has twenty rows and twenty columns one representing each of the twenty amino acids translated by the genetic code ● The value in each cell of a PAM matrix is related to the probability of a row amino acid before the mutation being aligned with a column amino acid afterwards. Conti.
  • 41. ● For each branch in the phylogenetic trees of the protein families, the number of mismatches that were observed were recorded and a record kept of the two amino acids involved. ● These counts were used as entries below the main diagonal of the matrix A, Matrix A is assumed to be symmetrical. ● The mutability of an amino acid is the ratio of the number of mutations the number of times it occurs in an alignment. ● Cysteine and tryptophan were found to be the least mutable amino acids. ● Cysteine's side chain contains sulfur which participates in disulfide bonds.  Collection of data from phylogenetic tree
  • 42. ● Relative mutabilities were evaluated by counting in each group of related sequences, the number of changes of each amino acid and dividing this number by a factor, called the exposure to mutation of the amino acid. ● This factor is the product of the frequency of occurrence of all amino acid changes that occurred in that group per 100 sites. ● By these scores,Asn,Ser,Asp and Glu were the most mutable amino acids, and Cys and Trp were the least mutable  Relative mutabilities
  • 43. ● The molecular clock hypothesis predicts that the rate of amino acid substitution in a particular protein will be approximately constant over time, though this rate may vary between protein families. ● This suggests that the number of mutations per amino acid in a protein increases approximately linearly with time. ● Determining the time at which two proteins diverged is an important task in phylogenetic.  Determining the time of divergence in phylogenetic trees
  • 44. ● Fossil records are often used to establish the position of events on the timeline of the Earth's evolutionary history, but the application of this source is limited. ● However, if the rate at which the molecular clock of protein family ticks that is, the rate at which the number of mutations per amino acid increases is known. ● Then knowing this number of mutations would allow the date of divergence to be found. Cont.
  • 45. ● PAM matrices are usually converted into another form, called as log odds matrices. ● The odds score represents the ratio of the change of amino acid substitution by two different Hypothesis. ● One that the change actually represents an authentic evolutionary variation at that site. Log odds matrices
  • 46. • PAM matrices are also used as a scoring matrix when comparing DNA sequences or protein sequences to judge the quality of the alignment. • This form of scoring system is utilized by a wide range of alignment software including BLAST. Use in BLAST
  • 47. ● The BLOSUM substitution is widely used for scoring protein sequence alignment. ● The BLOSUM matrices are based on different types of sequence analysis and much larger set than the PAM matrices. What is BLOSUM?
  • 48. ● The matrices values are based on the observed amino acid substitution in a large set more than 2000 conserved amino acid pattern called block. ● These blocks have been found in a database of protein sequence representing more than 500 families of related protein and act as signature of these protein families. What is BLOCK?
  • 49. ● The prosites catalog provides lists of protein that in the same family because they have similar biochemical function. ● For each family a pattern of amino acids that are characteristic of that function is provided ● Henikoff examined each prosite family for the presence of ungapped amino acid pattern blocks that could be used to identify members of that family. Cont.
  • 50. ● To locate these patterns the sequence of each protein family were searched for similar amino acid pattern by the MOTIF program. ● These initial pattern are organized into larger ungapped pattern(blocks) between 3 and 60 amino acid long by the Henikoffs PROTOMAT program. ● These blocks are present in all the sequence in each family. ● They could be used to identify other members of family. How to locate patterns?
  • 51. ● The blocks that are characterized each family provided a type of multiple sequence alignment for that family. ● The amino acid changes in column of alignment could be counted. ● The types of substitution were used to prepare a scoring matrix,BLOSUM matrix. ● These were given as logarithm of odd scores of ratio of observed frequency of amino acid divided by frequency expected by chance. Cont.
  • 52. ● The counting of amino acids changes in blocks. ● The sequence were grouped together into one substitution before scoring the amino acid sub. In aligned block. ● Pattern that were 60% identical were grouped together to make one substitution called BLOSUM60. ● And those 80% alike called BLOSUM80. How to count amino acids?
  • 53. ● Like PAM BLOSUM is based on similar principles of target frequencies of mutation. ● BLOSUM make use of BLOCK database. ● Blocks contain local multiple alignments of distantly related sequence. ● BLOSUM has an evolutionary model in its matrix formation as seen in PAM. Similarity between BLOSUM & PAM.