Scoring system is a set of values for qualifying the set of one residue being substituted by another in an alignment.
It is also known as substitution matrix.
Scoring matrix of nucleotide is relatively simple.
A positive value or a high score is given for a match & negative value or a low score is given for a mismatch.
Scoring matrices for amino acids are more complicated because scoring has to reflect the physicochemical properties of amino acid residues.
2. Scoring system is a set of values for qualifying the set of
one residue being substituted by another in an alignment.
It is also known as substitution matrix.
Scoring matrix of nucleotide is relatively simple.
A positive value or a high score is given for a match &
negative value or a low score is given for a mismatch.
Scoring matrices for amino acids are more complicated
because scoring has to reflect the physicochemical
properties of amino acid residues.
3. Transition --- substitutions in which a purine (A/G) is replaced by
another purine (A/G) or a pyrimidine (C/T) is replaced by
another pyrimidine (C/T).
Tansversions ---
(A/G) (C/T)
1000G
0100C
0010T
0001A
GCTA
Identity matrix
1-5-5-
1
G
-51-1-
5
C
-5-11-
5
T
-1-5-51A
GCTA
Transition-Transversion matrix
5. PAM - point accepted mutation based on
global alignment [evolutionary model]
BLOSUM - Block substitutions based on
local alignments [similarity among
conserved sequences]
6. First given by Dayhoff who compiled alignment of 71
groups of very closely related protein sequences.
PAM- Point Accepted Mutation.
PAM matrix were derived based on evolutionary
divergence between sequences of protein structure.
Construction of PAM1 matrix involves alignment of full
length sequence & subsequent construction of
phylogenic trees using parsimony principle.
7. Ancestral sequence information is used to count the number
of substitution along each branch of tree.
Positive scores in the matrix denotes substitutions occurring
more frequently than expected among evolutionary
conserved replacements.
Negative score corresponds to substution which occurs less
frequently.
A PAM is defined as 1% amino acid change or one mutation
per 100 residues.
The increasing PAM numbers correlate with increasing PAM
units & thus evolutionary distances of protein sequences.
8. Constructed based on the phylogenetic
relationships prior to scoring mutations;
Difficulty of determining ancestral
relationships among sequences;
Based on a small set of closely related
proteins;
9. It is a series of block amino acid substitution matrix.
Derived on the basis of direct observation for every
possible amino acid substitution in multiple sequence
alignment.
Sequence pattern is also called as block.
Ungapped alignments are less than 60 amino acid in
length.
BLOSUM matrix are actual % values of sequence
selected for construction of matrix.
10. BLOSUM 62 indicates that sequence selected for
constructing the matrix is an average share of 62%.
BLOSUM share for a particular residue pair is derived
from the log ratio of observed residue substitution
versus the expected probability of particular residue.
Lower the number of BLOSUM more divergent species
are present.
11. C S T P A G
C 9
S -1 4
T -1 1 5
P -3 -1 -1 7
A 0 1 0 -1 4
G -3 0 -2 -2 0 6
BLOSUM62 was
measured on pairs
of sequences with
an average of 62 %
identical amino
acids.
Log-odds = log ( )chance to see the pair in homologous proteins
chance to see the pair in unrelated proteins by chance
12. PAM
› Based on mutational
model of evolution
(Markov process)
› PAM1 is based on
sequences of 85%
similarity
› Designed to track the
evolutionary origins
BLOSUM
› Based on the multiple
alignment of blocks
› Good to be used to
compare distant
sequences
› Designed to find
proteins’ conserved
domains