Pat Heslop-Harrison: Lecture to University of Malaya, Kuala Lumpur, Malaysia December 2013
Some DNA sequences are recognizable in all organisms and originated with the start of life. Others are unique to a single species. Some sequences are present in single copies in genomes, while others are present as millions of copies. The total amount of DNA in cells of an advanced eukaryotic species can vary over three orders of magnitude, and chromosome number can vary similarly. How can such huge variations be accommodated within the constraints of organism growth, development and reproduction? What are the evolutionary implications of these huge variations? How can we use the information to understand plant evolution, cytogenetics, genetics and epigenetics? What are the implications for future evolution, biodiversity and responses of plants during plant breeding or climate change?
Genome evolution - tales of scales DNA to crops,months to billions of years, chromosomes to ecosystems
1. Genome evolution:
tales of scales
Pat Heslop-Harrison
phh4@le.ac.uk
www.molcyt.com and www.molcyt.org
User & pw ‘visitor’
Twitter, YouTube and Slideshare: pathh1
20 December 2013
2. Proso millet (Panicum miliaceum):
origins, genomic studies and
prospects
Pat Heslop-Harrison, Farah Badakshi
and Harriet Hunt
14C Millet: Tacuinum Sanitatis via Wiki
See Paris & Janick Ann Bot 2009-2013
3. Scales – metres, kilograms,
seconds, numbers
• Time: 3.5 billion years from the first living cells
• Time: a generation in hybrids or stress response, or
few years for plant breeding
• Size: the amount of DNA from a few kb in viruses to
variation in genome size between species
• Size: from single base modifications to whole
genome changes
• Numbers: from 2 to 1000s of chromosomes
• Area: from endemics to worldwide
• Numbers: from a few plants to millions of ha
• Scale (synonym): balancing or comparing
4. Plant genome size range > 2,300 x
Genlisea aurea
1C = 63.6 Mb
Paris japonica
20 µm
Image wikicommons
Chromosomes & data
see Bennett et al. 2011 Ann Bot.
1C = 149,000 Mb
5. •
•
•
•
•
•
•
•
Genome sizes: reading them out
base-by-base
HIV type 1 Virus
Bacteria (E. coli)
Yeast
Genlisea
Arabidopsis
Man
Wheat
Paris
2hr 40 min
53 days
138 days
2 years (20mm)
5 years
100 years
5 centuries
4 millennia (50m)
6.
7.
8. Repetitive DNA-Sequences form the largest part of the genome
Species
size
Arabidopsis thaliana
Sugar beet Beta vulgaris
Broad bean Vicia faba
Rye Secale cereale
Onion Allium cepa
Repetitive DNA
>25%
63%
85%
92%
95%
Genome
145 Mbp
758 Mbp
12000 Mbp
8800 Mbp
15100 Mbp
These species are all diploid – 2x
Human Homo sapiens
45%
3000 Mbp
14. DNA sequence
TE
Centromere
TE
Tandem repeat monomer
Transposable element
Single copy DNA
Kinetochore
147bp plus 5-70bp linker = 150-220bp
Metaphase
chromosome
Spindle microtubules pulling apart
chromatids
Heslop-Harrison JS, Schwarzacher T. 2013. Nucleosomes and centromeric DNA packaging. Proc Nat Acad Sci
USA. http://dx.doi.org/10.1073/pnas.1319945110. See also http://molcyt.org (Dec 2013)
15. Nucleosomes
in Rye
Digest intact chromatin
(DNA + histone) with
micrococcal nuclease for a
few seconds, cutting between
the nucleosomes. Then treat
with protease and run on
agarose gel.
• Vershinin & Heslop-Harrison
16. •
Three copies of the Arabidopsis 180 bp repeat showing (dark purple, stepped line) GC
content of the sequence and (red, smooth line) sequence curvature. While GC and AT
rich regions of a sequence generally correlate with curvature, the kinked region shows
curvature with low GC content.
17. Arabidopsis cell line with a macro-chromosome
Anti-phosphohistone H3 locates exclusively at the centromeres of the small
chromosomes. In contrast, the antibody shows a weak but more uniform distribution
along the full length of the macrochromosome
20. Simple sequence repeats
• GGCTACGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA
GAGAGATGGTCGTAATG
• Flanked by unique sequences (SSR/microsatellite
markers) or
• Part of other repetitive elements
• Dispersed OR clustered in genome
• SSR markers are dispersed!
24. Retroelement abundance and
diversity in barley
Gypsy elements are present in 25% of all BAC clones
Barley gypsy: Vershinin, Druka, Kleinhofs, HH: PMB 2002;
Brassica Alix & HH PMB 2005
28. gag
en rt
LINE Retrotransposon
(non-LTR Retrotransposon)
LTR
gag
rt int LTR
Gypsy
(LTR Retrotransposon)
LTR
gag
int rt LTR
Copia
(LTR Retrotransposon)
LTR
gag
Common structure of Retroelements
rt int env LTR
Retrovirus
gag
en
rt
LTR
env
– core particle compone
– endonuclease
– reverse transcriptase
– long terminal repeat
– envelope glycoprotein
29. Gene
Full name
Position
Function
ORF
Open
reading
frame
LTR
Long
terminal
repeat
Flanking
retrotrans
posin
eae
Regions of several hundred base pairs (250-4000)
containing regulatory sequences for gene expression:
Enhancer, promoter, transcription initiation (capping),
transcription terminator and polyadenylation signal. The
3' LTR is not normally functional as a promoter,
although it has exactly the same sequence arrangement
as the 5' LTR. Instead, the 3' LTR acts in transcription
termination and polyadenylation. As a consequence of
the replication mechanism of the elements the two LTRs
are identical at the time of integration.
PBS
Primer
binding
site
About 18
nt at
the
end of
the
5’LTR
Binding site for a specific tRNA that functions as the primer
for reverse transcriptase to initiate synthesis of the
minus (-) strand of viral DNA
Gag
Groupspecific
antigen
Usually
one of
the
first
ORFs
The gag precursor is cleaved by the viral protease (encoded
by pol) into three mature products: the matrix (MA), the
capsid (CA), and the nucleocapsid (NC) together
forming the “capsid” which surrounds the genome –
this complex is the virus core. Equivalent to the coat or
transit protein.
CP
Coat protein
Sequence capable of translation into a protein
Equivalent to gag
30. Cys-His
or
C-H
Cysteinehistidine
repeat
motif
C-terminal
of gag
RNA or DNA binding site of the coat protein or gag
(NEXT SLIDE!)
Pol
Polyprotein
PR
Aspartic
protease
pol
Cleaves the full length mRNA.
PR has a significant role in the processing of the
polyprotein precursor into the mature form.
RT
Reverse
transcrip
tase
pol
RNA dependant DNA polymerase – translates RNA to DNA
RH
Ribonucleas
e H/
RNase H
pol
RNase H is an enzyme that specifically degrades RNA
hybridized to DNA.
INT
Integrase
pol
Enzyme responsible for removing two bases from the end of
the LTR and inserting of the linear double stranded DNA
copy of the retroelement genome into the host cell DNA
Env
Envelope
gene
After pol,
but
not in
parare
trovir
us if
MP=e
nv
Envelope genes mediate the binding of virus particles to
their cellular receptors enabling virus entry, the first
step in a new replication cycle. Thus the envelope genes
give retroelements the ability to spread between cells
and individuals - infectivity.
Contain the proteins SU (surface) and TM (transmembrane).
MP
Movement
protein
Cell to cell movement, maybe equivalent to env
TAV
Transactivat
Regulating translation of the polycistronic mRNA
Contains aspartic protease, reverse transcriptase and
RNase H and in some cases integrase
39. Organelle sequences
from chloroplasts or
mitochondria
Sequences from viruses,
Agrobacterium or other
vectors
Plant Nuclear
Genome
Genes, regulatory and noncoding single copy sequences
Repetitive DNA sequences
Transgenes introduced
with molecular biology
methods
45S and 5S
rRNA genes
Other genes
Repeated genes
Structural
components of
chromosomes
Dispersed repeats:
Transposable Elements
Retrotransposons
amplifying via an
RNA intermediate
DNA transposons
copied and
moved via DNA
Centromeric
repeats
Telomeric
repeats
Tandem repeats
Subtelomeric
repeats
Blocks of tandem
repeats at discrete
chromosomal loci
Simple sequence
repeats or
microsatellites
DNA sequence components of the plant nuclear genome
Heslop-Harrison & Schmidt 2012. Encyclopedia of Life Sciences
40. Genome
• Genes and regulatory sequences make up a
small proportion of the genome
• The majority of DNA sequences in all higher
eukaryotic genomes are repetitive sequences
(50-90%)
• FUNCTION?
• Different sequence classes evolve at different
rates
41.
42. Aegilops tauschii (D genome donor) in Iran
• 57 accessions
collected
– ssp. tauschii
•
•
•
•
var. meyeri (18)
var. tauschii (22)
var. anathera (4)
var. meyeri (12)
Hojjatollah Saeidi, Mohammad Reza Rahiminejad, Sadeq
Vallian, HH
43. Diversity in D
genome
• Microsatellite markers
• 57 accessions of wild
Aegilops tauschii (2n = 2x =
14; D genome)
• No SSR markers were
characteristic for taxa or
geographical origin
• High diversity present
Saeidi, HH et al. Genet Resources & Crop
Evolution 2005
44. Aegilops tauschii in Iran
dpTa1Repetitive
banding pattern
does correlate
with taxonomic
grouping
Dpta1
Hojjatollah Saeidi and Pat Heslop-Harrison
45. In situ repetitive DNA
markers
Markers characteristic for
taxa
Evolution of genes/DNA
markers and repetitive (SSR
are different)
High diversity present
Useful genes for
wheat breeding
46. UPGMA dendrograms of the relationships based on IRAP analysis of (A) accessions of Ae.
tauschii subsp
Saeidi, H. et al. Ann Bot 2008 101:855-861; doi:10.1093/aob/mcn042
Copyright restrictions may apply.
47. Demonstration of the
direction of distribution
(phylogeography) even
over short geographic
distances
Phylogeography of Ae. tauschii
Species originated from North
of Iran and distributed in
two directions.
tauschii genotype passes from
middle parts of Alborz
Mountains and the
distributed eastward and
westward (direction 1)
strangulata genotype are
distributed along the
Caspian Sea shore (direction
2)
50. Mammalian Chromosome Evolution
• Mammals: 3,500 Mbp genome size remarkably
conserved
• Diploid chromosome numbers vary from 2n=6
(Indian muntjak) to 2n=134 (black rhinoceros).
• From 2n=2 (an ant species), several species with
2n=4; to 2n>1000 in some ferns
• No correlation of chromosome number with
evolutionary position
• loss and gain occurs
•
51. Bos taurus taurus vs Bos taurus indicus:
2n=60, XY
But: B. taurus submetacentric Y
B. indicus acrocentric Y
53. How many chromosomes?
• Is the number constant in a species?
• Cattle 2n=60
– but some individuals have
2n=58 or 2n=59 because two
chromosomes fuse
• Chromosomal evolution is happening now
54. The 1;29 fusion in cattle
• Found in multiple breeds
• Sometimes a founder effect (imported in one
bull – e.g. Brahman to Africa)
• But present even in major breeds
• Limited effect on fertility
• Probably positively selected for a difficult-toscore trait
62. Complex satellite DNA reshuffing in the polymorphic t(1;29) Robertsonian translocation and
evolutionarily derivedchromosomes in cattle R. Chaves1, F. Adega1, J. S. Heslop-Harrison2,et al. 2003
64. • Goat
• Sheep
• Cattle
• Chromosome
homologies and
centromeric
fusions
• Paul Popescu
65. Do we see chromosome fusion now?
Molecular
cytogenetic
analysis and
centromeric
satellite
organization of a
novel 8;11
translocation in
sheep: a possible
intermediate in
biarmed
chromosome
evolution. 2003.
Chaves, Adega,
Wienberg,
Guedes-Pinto,
Heslop-Harrison
66. Sheep
2n = 53, XY
chromosome paints for 8 (yellow) and 11 (magenta; e),
satellite I (yellow f), satellite II (cyan g). Chaves, HH et al. 2003
67. • Satellite I and II
probes in the
biarmed
chromosomes of
the sheep with
2n = 53, XY.
• Chr (8;11), 2, 3,
1 are ordered
from the most
recent to the
postulated
evolutionarily
oldest
chromosome
68. • t(8;11) showed satellite I proximal on both arms with
satellite II covering the centromere, while the
evolutionarily derived fusion leading to Chrs 2 and 3
showed the opposite configuration, not obviously
derived by a simple fusion. Chr 1 has lost the satellite
I hybridization patterns. The novel t(8;11) provides
strong evidence for an intermediate step in evolution
of the biarmed chromosomes in sheep.
69. 2n=52, XY
including 4
bi-armed
chromosomes = 58
autosomal
chromosome arms
+X,Y
• Syncerus caffer (African Buffalo or Cape Buffalo), a
bovid from the family of the Bovineae
70. Tragelaphus strepsiceros or greater
kudu
2n=31, X1 X2 Y
26 biarmed
chromosomes,
three acrocentric
chromosomes (inc.
X1), acrocentric X
and a biarmed Y
71. sheep (Ovis aries) centromeric DNA satellite I-clone pOaKB9 (green-FITC) to metaphase
chromosomes (chromosomal DNA stained with DAPI, presented in red pseudocolour) of the: (a) tribe Caprini, Ovis
ammon (female, 2n=54,XX), (b) tribe Reduncini, Kobus leche (male, 2n=48,XY ), (c) tribe Hippotragini, Addax
nasomaculatus (female, 2n=58,XX), (d ) tribe Alcelaphini, Connochaetes taurinus (male, 2n=58,XY ), (e) tribe
Alcelaphini, Damaliscus hunteri (male, 2n=44,XY), ( f ) tribe Aepycerotini, Aepyceros melampus (female, 2n=60,XX).
72. Phylogenetic relationships and the primitive X chromosome inferred from chromosomal
and satellite DNA analysis in Bovidae Raquel Chaves1,*, Henrique Guedes-Pinto1 and
John S. Heslop-Harrison Proc Roy Soc B 2005
75. Genome Specificity of a CACTA
(En/Spm) Transposon
B. napus (AACC, 2n=4x=38) B. oleracea (CC, 2n=2x=18) B. rapa (AA, 2n=2x=20)
76. Genome Specificity of a CACTA (En/Spm) Transposon
B. napus (AACC, 2n=4x=38) – hybridized with C-genome CACTA element red
B. oleracea (CC, 2n=2x=18) B. rapa (AA, 2n=2x=20)
Alix & HH 2008
77. Genome Specificity of a CACTA (En/Spm) Transposon
B. napus
AJ 245479
AC 189496
B. rapa
AC 189446
AC 189655
AC 189480
B ot1-1
large insertion
specific of Bot1-1
B. oleracea
large insertion in common between
Bot1-2 and Bot1-3
B ot1-2
B ot1-3
Bo6L1-15
1010bp
Rearrangement
specific of Bot1-3
78. Genome Specificity of a CACTA
(En/Spm) Transposon
•Bot1 has encountered several rounds of amplification in the C (B.
oleracea) genome
only, playing a major role in the recent B. rapa
and B. oleracea genome divergence
•Bot1 carries a host S-locus associated SLL3 gene copy; is the
transposon associated with SLL3 proliferation?
Transposons are a
driver of genome and
genome evolution
Alix et al. The CACTA transposon Bot1 played a major
role in Brassica genome divergence and gene
proliferation. Plant Journal December 2008
79. Dot-plots of genomic sequence from homologous pairs of BACs
kb
Brassica rapa (A genome) sequence
Region of high homology between A and C sequence
Region of low homology
4kb Insertion-gap pair: present in C genome
500bp Insertion-gap pair: present in A
Microsatellite
Transposed (moved)
sequence
An inversion
Dotter plot of Brassica oleracea var. alboglabra clone BoB028L01 x Brassica rapa subsp. pekinensis clone
KBrB073F16 with transposable elements.
19/12/2013
gi 195970379 vs. gi 199580153
Brassica oleracea (C genome) sequence
79
81. Insertion polymorphism in Brassica genomes
shown by PCR with flanking primers
A)
Brassica rapa
Brassica nigra
Uncertain
Brassica
Brassica oleracea
Brassica juncea
6X
Brassicas
Brassica napus
Brassica carinata
1500
1000
800
600
400
200
HP1 1 2
3
4
5
6 HP1 7
8
9 10 11 12 13 14 15 16 17 18 HP1 19 20 21 22 23 24
25 26 27 28 29 30 HP1 31 32 33 34 35 36 37 38
B)
Brassica rapa
Brassica nigra
Uncertain
Brassica
Brassica oleracea
Brassica juncea
Brassica napus
Brassica carinata
6X
Brassicas
1500
1000
800
600
400
200
HP1 1 2
3
4
5
6 HP1 7
8
9 10 11 12 13 14 15 16 17 18 HP1 19 20 21 22 23 24
25 26 27 28 29 30 HP1 31 32 33 34 35 36 37 38
Amplification with two primer sets (top and bottom)
B. rapa (AA), B. juncea (AABB) and B. napus (AACC) include the longer fragment with insertion.
B and C genomes have only the shorter, lower, fragment without insertion.
19/12/2013
81
82. hAT 141F hAT 185F
1
hAT 8002
246 TSD TIR
542-bp TE
TIR TSD 790
hAT 177R
1000
B. rapa (4718648200)
………………..……………….
B. oleracea
(66,350-66750)
B. rapa
(AA)
B. juncea
(AABB)
B. napus
(AACC)
Hexaploid Brassica
(carinata x rapa)
B. nigra (BB)
………………..……………….
B. oleracea (CC)
………………..……………….
………………..……………….
B. carinata
(BBCC)
B. oleracea
(GK97361)
………………..……………….
=A
=T
=C
=G
Schematic representation of insertion in Brassica rapa and other Brassica genomes. Green, red, blue and
black boxes showing DNA motifs
19/12/2013
82
86. EvolutionEpigeneticsDevelopment
Phenotype
Cause
Multiple abnormalities
Chromosomal loss, deletion or
translocation
Gene mutation / base pair
changes
Telomere shortening
(Retro)transposon insertion
Retrotransposon activation
SSR expansion
Methylation
Heterochromatinization
Chromatin remodelling
Histone modification
Genetic changes
non-reverting
Changes seen, some reverting
(Male/Female)
Normal Differentiation
87. From Chromosome to Nucleus
Pat Heslop-Harrison phh4@le.ac.uk www.molcyt.com
88. Scales – metres, kilograms,
seconds, numbers
• Time: 3.5 billion years from the first living cells
• Time: a generation in hybrids or stress response, or
few years for plant breeding
• Size: the amount of DNA from a few kb in viruses to
variation in genome size between species
• Size: from single base modifications to whole
genome changes
• Numbers: from 2 to 1000s of chromosomes
• Area: from endemics to worldwide
• Numbers: from a few plants to millions of ha
• Scale (synonym): balancing or comparing
89. Genome evolution:
tales of scales
Pat Heslop-Harrison
phh4@le.ac.uk
www.molcyt.com and www.molcyt.org
User & pw ‘visitor’
Twitter, YouTube and Slideshare: pathh1
20 December 2013
92. • Some DNA sequences are recognizable in all organisms and
originated with the start of life. Others are unique to a single
species. Some sequences are present in single copies in genomes,
while others are present as millions of copies. The total amount of
DNA in cells of an advanced eukaryotic species can vary over three
orders of magnitude, and chromosome number can vary similarly.
How can such huge variations be accommodated within the
constraints of organism growth, development and reproduction?
What are the evolutionary implications of these huge variations?
How can we use the information to understand plant evolution,
cytogenetics, genetics and epigenetics? What are the implications
for future evolution, biodiversity and responses of plants during
plant breeding or climate change?
Editor's Notes
Image Credits: The two photographs showing the flowers have come from wikicommons as follows:Genliseaaurea: http://upload.wikimedia.org/wikipedia/commons/d/df/Genlisea_aurea_flower_4_Darwiniana.jpgParis japonica: http://www.kalle-k.dk/plants5.htm The two images of the chromosomes have come from the cited papers in the legend.