2. Triplet repeat disease
General term of hereditary diseases caused by abnormal
expansion of triplet repeats
CAG poly-glutamine diseases GCN poly-alanine diseases
Huntington’s disease oculopharyngeal muscular dystrophy
spino-cerebellarataxia 1, 2 etc. hand-foot-genital syndrome etc.
5’ end 3’ end
5’UTR intron coding region 3’UTR
spino-cerebellarataxia 12 CAG Friedreich’s ataxia GAA myotonic dystrophy type 1 CTG
myotonic dystrophy type 2 CCTG
spino-cerebellarataxia 10 ATTCT
e.g. triplet repeat length
Normal Patient
Huntington’s disease 11~34 36~121
Myotonic dystrophy type 1 5~37 50~5000
3. Problem of conventional research
Conventional in silico research has been focusing on tandem
repeats.
A repeat may have one or more interruptions.
Interruptions are NOT considered by conventional in silico research.
e.g. SCA2 ・・・ (CAG)13CAA(CAG)9 ・・・
HoxA13 ・・・ (GCG)3GCCGCGGCT(GCG)3GCCGCG ・・・
Hypotheses
Evolution of interruptions :
Interruptions may have appeared by point mutations during repeat
evolution
Functions of interruptions :
Stabilization of repeats by breaking up the perfect repeat stretches
Prevention of repeat expansions by decreasing the thermodynamic
stability of DNA hairpins
4. Purpose
Reinvestigate triplet repeats taking interruptions into
account
Advantages :
1. Extraction of repeat regions which are more similar to
those of the genes causing triplet repeat diseases
2. In depth study of triplet repeats evolution
Comparative analysis of triplet repeats in vertebrates
5. Definition of interruptions and repeats
Interruptions and repeats are defined using triplet repeat
disease genes and statistical significance
Repeats e.g. SCA2
5 or more consecutive identical ・・ CCC(CAG)13CAA(CAG)9 CCG ・・
triplets or amino acids
↓
Interruptions Interruption : CAA
one triplet or amino acid different Repeat : underline
from the repeating unit Repeat length : 23
Repeat length
Number of triplets or amino acids
in the repeat region
Extraction from the Ensembl database of repeats with interruptions
for human, chimpanzee, and mouse coding sequences (CDS)
6. Amino acid repeats with interruptions
Investigation of the average amino acids repeat length between
repeats with and without interruptions in human CDS
Average amino acid length
average length of amino acid repeats in human
between repeats with and without interruptions
16
>=5 including interruptions
With interruptions
Without interruptions
>=5 without interruptions
14
12
10
length
8
6
4
2
0
A C D E F G H I K L M N P Q R S T V W Y
Repeat
With the exception of glutamine-repeats (Q), all repeats with interruptions
have a bigger standard deviation than those without interruptions.
There is no significant difference between repeats with interruptions as
well as between repeats without interruptions.
7. Human, chimpanzee and mouse
repeats
Investigation of the average repeat length and of the percentage of
each repeat in human, chimpanzee and mouse
average length of amino acid repeats including
Average amino acid length
Percentage of each repeat in human, chimpanzee and mouse
interruptions (>=5)
in human, chimpanzee and mouse
20
18
18
16 human
chimanzee
16
14 mouse human
14 chimpanzee
12 % mouse
length
12
10
10
8
8
6
6
4
4
2
2
0
0
A C D E F G H I K L M N P Q R S T V W Y S P L E A G K R Q D T V H F I C N Y M W
Repeat Repeat
Human has the biggest standard deviation for every repeat, but there is no
other significant differences between the three species.
8. Number of point mutations resulting
in an interruption
Investigation of the number of point mutations resulting in an interruption
existing in the top 10 human, chimpanzee and mouse triplet repeats
Distributions of point mutations in interruptions
in CDS in the top 10 triplet repeats (%)
90
80 Mouse 1 point
70 Human 1 point Interruptions caused by one
60
Chimpanzee 1 point point mutation are the most
abundant.
50
40
30 Chimpanzee 2 points
The distribution in each triplet
Mouse 2 points
repeat does not vary much
20
Human 2 points among species.
10
Human 3 points
Chimpanzee 3 points
0 Mouse 3 points
GAG CTG CAG GGC GAA AAG AGC GCC GAT GCG
Repeat Shippage
9. Human and mouse comparison
The former results show that repeat and interruption percentages
do not vary much between human and mouse
The reasons why are not well understood because, up to now, only
few comprehensive analysis of repeats with interruptions in human
and mouse orthologous genes have been carried out.
Investigation of interruption-containing repeats
in human and mouse orthologous genes
10. Repeats in orthologous genes
Extraction of triplet repeats from human and mouse orthologous genes
Investigation of the number of genes with repeats existing only in
human or only in mouse
% Specific repeat percentage
20
mouse
Without only in human
With repeats only in mouse
repeats
15
both in human and mouse
human
Only in human
With repeats
8,958 4,996
10
Without Only in mouse
repeats 3,522 358532
5
The percentage of a specific repeat
does not vary much between the two
species. 0
S A L P E G K R Q D T V I N F H C M Y W
Repeat
This result suggests that the amino acid
composition of repeats between the two
species is not different.
11. Repeats in orthologous genes
Investigation of the average repeat length of repeats existing only in
human or only in mouse
Distributions of differences in average repeat length
16
only in human
In human, glutamine-
14
only in mouse repeats (Q) have the
12
biggest SD scores,
while in mouse, it is the
10
length
glycine-repeats (G)
8
6
4
The average length of repeats does
not vary much between the two
2
species.
0
A C D E F G H I K L M N P Q R S T V W Y This result suggests that the rate of
Repeat
repeat expansion between the two
species is similar.
12. Repeats in orthologous genes
Investigation of differences in glutamine-repeats length between
orthologous genes
Distribution of differences in glutamine-repeats length between human and mouse
45
40
The distribution is almost symmetrical.
35
30
←+human +mouse→
25
length
20
15
10
glutamine-repeats exist at the length
and expand at the same rate in both 5
species. 0
-30-28-26-24-22-20-18-16-14-12-10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
length
0 > human repeats are extended
0 < mouse repeats are extended
13. The Repeat Evolution Model
Our results show that:
There are 22,581 amino acid repeats with interruptions and 6,015
repeats without interruptions in human CDSs.
Interruptions caused by one point mutation are the most abundant.
This is in agreement with the current Repeat Evolution Model
The formation of repeats is caused not only by expansion of triplet
repeats but also by point mutations.
Pure repeats are formed by point mutation.
CAG CAC CAG CAG Nucleotide sequence
mutation
CAG CAG CAG CAG Pure repeat
mutation
CAG CAG CAA CAG Impure repeat
Impure repeats evolve from pure repeats because they are more stable.
14. Conclusions
The formation of interruptions is independent of the species
and triplet repeat length, but varies depending on the repeated
triplet.
Taking interruptions into account, the average length of repeats
does not vary much between the investigated species.
Our results are consistent with the proposed Repeat Evolution
Model.
In summary, it appears that repeats have been expanding
at the same rate, independently of the species and despite
the diversification of the genomes.
15. Outlook
To compare the function of proteins encoded by repeat-
containing genes in order to investigate the role of the repeats
To further investigate the relationship between repeats and
interruptions in order to verify the Repeat Evolution Model
To combine disease and repeat information to investigate the
role of repeats in diseases