2. Find out Mutations
>SA1 G A T C C T G T A T G C C A C T C G A C A C G A T G T C T G
>SA2 A A T C C T G T A T G C C A C T C G A A A C G A T G T C T G
>SA3 G A T C C C G T A T G C T A C T C G A T A C G A T G T C T A
>SA4 A A T T C T G T A T G C C A C T C G A T A C G A A G T C T G
>SA5 G A T C C T A T A T G C C A C T C A A T A C G A T G T C T G
>SA6 C A T C C T G T A T G C C A T T C G A T A C G A T G T C T C
>SA7 G A T C C T G T A T A C C A C T C G A A A C G A T G T C T G
>SA8 G A T C C T G T A T G C C A C T C G A T A T G A T G T C T G
>SA9 G A T C C T G T A T G C C G C T C G A T A C G A T G T C T G
>SA10 G A T C C C G T A T G C C A C T C G A T A C G A T G T C T G
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
In this example, there are 10 hypothetical sequences, containing 30 nucleotides each. If all the sequences are same and of same length (e.g. a gene of a
species) then no need of aligning them because already they are aligned. Otherwise, we need to align them using multiple seq. alignment s/w like MUSCLE
(Edgar 2004). Before finding out mutations, we need to generate a reference sequence. Let us assume that there was a sequence long time ago and during
the course of evolution, mutations are introduced in to the sequence at different positions and that is why, now we have these 10 different sequences. As
these mutations are very rare in nature, in each position in the alignment, we can see one nucleotide to be the most frequent with few other rare
nucleotides. For example, at the 1st position, the most frequent nucleotide is G (7 times) along with two A and one C. So we can assume that in the original
sequence G was present and G is mutated to C in one sequence and to A in another two sequences. Therefore, we can take the most frequent nucleotide
at each position and generate the reference sequence (Shown below). Then, from this reference sequence, we may find out the mutations. For example, in
the 1st positions, mutations are G->C, G->A and G->A. In the 4th position there is a mutation C->T. In the 6th position there are two mutations T->C and like
that in other positions. We have considered only single mutations (shown in BLUE) assuming them to be more recent ones and ignore multiple mutations
(Shown in GREEN).
We wrote a program in C to compute these mutations.
>Ref G A T C C T G T A T G C C A C T C G A T A C G A T G T C T G
4. Mutations in Staphylococcus aureus
Height of the vertical bars represent: No. of mutations in a nucleotide divided by the
total number of that nucleotide.
0.000
0.050
0.100
0.150
0.200
0.250
0.300
A->C A->G A->T C->A C->G C->T G->A G->C G->T T->A T->C T->G
InterGenicRegion
6. Mutations in S. pneumoniae
Height of the vertical bars represent: No. of mutations in a nucleotide divided by the
total number of that nucleotide.
0.000
0.050
0.100
0.150
0.200
0.250
0.300
A->C A->G A->T C->A C->G C->T G->A G->C G->T T->A T->C T->G
InterGenicRegion
8. Mutations in M. tuberculosis
Height of the vertical bars represent: No. of mutations in a nucleotide divided by the
total number of that nucleotide.
0.000
0.020
0.040
0.060
0.080
0.100
0.120
0.140
0.160
0.180
0.200
A->C A->G A->T C->A C->G C->T G->A G->C G->T T->A T->C T->G
InterGenicRegion