2. Genome sequence of C.elegans.
Sequence of entire genome.
Sequence of cDNA clones.
Approximately 19,500 PREDICTED protein
coding gene sequences.
Large number of various kinds of functional
RNAs – not discuss further.
For this lecture – focus predicted proteins.
Gene prediction? How?
Science, December 1998.
3. Computer based predictions
GENEFINDER (C.elegans), BLAST (all genomes) and other computer
programs.
Biases in coding sequence - in C. elegans non-coding is AT rich.
Splice site signals, initiator methionines, termination codons.
Likely exons and probable/possible splice patterns.
BLAST – compare the Translation of all 6 reading frames.
• Evidence that a prediction is correct?
• Homology with genes in other organisms – homologues.
• Known protein families.
•Experimental evidence.
4. The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity
between sequences.
The program compares nucleotide or protein sequences to sequence databases and
calculates the statistical significance of matches.
http://www.ncbi.nlm.nih.gov/
The National Center for Biotechnology Information (NCBI), the U.S. National Library of Medicine.
How does BLAST work?
mqnpmillifclfcavicsrgtdsdiphef Protein Sequence
Single Letter code
Search windows
BLAST compares small sequential blocks – or WINDOWS- of sequence against massive
databases.
It looks for regions of similarity and scores them.
5. More BLAST
High similarity BLAST score Conserved regions
Non-conserved regions
Low similarity BLAST score
Large Protein
Small windows of comparison - detect LOCAL regions of similarity.
Output - % identity and % similarity (permits conservative substitutions of aa.)
Gives overall score and probability of relatedness.
If the entire protein sequence was compared in one go, you may get a relatively low
overall similarity.
How did genes and gene families evolve and what is meant by protein domains?
We need to come back to this – remember the question!
6. Below is the sequence of a protein:
HOMEWORK
mqnpmillif clfcavicsr gtdsdiphef hkmlkhaksl nsllrdlhvi yspemtnrhvektdkhgaal slksgsmsaq
rivsiqnisd demdgytlfh lqsmkdikqg ndtcnlqsvcvpipqlsddp qvlmypkcye vkqcvgsccn svetchpgti
nlvkkhvael lyigngrfmfnmtkeitmee htscscfdcg sntpqcapgf vvgrsctcec ankeernncv
gnatwnaetckcecdlkcee gkilhkdrcd cvrrrqhhgg prghhghrhh hrsrpidtee vqkigqlkvgrigg
Go to NCBI http://www.ncbi.nlm.nih.gov/
Go to Blast then look down the left for “Choose a BLAST program to run”
From within that section, select “protein blast”.
Copy the above protein sequence and paste it into the box on the top left of web page.
Scroll down the page and click the big blue BLAST button.
Have a look at the outcome – any questions – post to the Forum on moodle.
BLAST is one of the powerful computational tools for Comparative Genomics
7. Computational biology is mostly predictive – not EXPERIMENTAL
Lets look at simple experimental evidence for existence of genes.
“The Central Dogma” of Molecular Biology
DNA → mRNA → Protein
Expressed sequence tags (ESTs) – cDNA clones.
To make cDNA mRNA is copied to DNA with reverse transcriptase.
RNA → DNA
Retroviruses (e.g. HIV).
RNA genome → DNA → integration → mRNA → protein
8. Making cDNA
Typical eukaryotic gene - double stranded DNA
exon
intron
1. RNA Polymerase
Primary transcript – single sense strand RNA – introns present
5’ 3’OH
RNA exon
2. Capping, splicing, poly-adenylation
Messenger RNA (mRNA)
5’ CAP AAAAAAAAAAA 3’OH
OH-TTTTTTTT-5’ DNA primer
3. First strand cDNA synthesis -reverse transcriptase
AAAAAAAAAAA RNA/cDNA duplex
TTTTTTTT
4. Second strand cDNA – DNA polymerase
AAAAAAAA
TTTTTTTT Double stranded cDNA
9. EST sequencing was carried out in parallel to genome sequencing.
Simplest experimental evidence that a bit of genomic DNA contains a gene.
Making cDNA
cDNA synthesis oligo dT priming
Messenger RNA (mRNA)
AAAAAAAAAAA 3’OH
OH-TTTTTTTT-5’ DNA primer
cDNA synthesis by random priming
AAAAAAAAAAA 3’OH
DNA primer
OH-NNNNNNNNN-5’
Random 6-mers or 9-mers
The advantage of Random Priming is cDNA clones not biased towards 3’ end of gene.
10. Sequence data from Random Primed cDNA – ESTs (or EST Tags)
Typical eukaryotic gene - double stranded DNA
EST 1
EST 2
EST 3
EST sequences
EST 4
The sequencing of ESTs uncovered frequent examples of differential splicing.
Common examples of which are exon skipping (above)
Alternative 5’ exons, alternative splice altering stop codons, genes within
genes etc.
Above true for C. elegans, humans, flies, and many other species.
11. • C. elegans EST data from approximately 50,000 cDNA clones.
• Identified 9,356 different genes.
1. Grind up thousands of worms.
2. Prepare mRNA – convert to cDNA with reverse transcriptase – clone in plasmid.
3. Some mRNSs exist at extremely low levels of abundance.
4. Low abundance cDNAs may be impossible to clone randomly.
12. Reverse transcriptase PCR – very sensitive.
Gene
AAAAAAAA mRNA
Primer A.
Primer B
cDNA from mRNA using reverse transcriptase.
Amplify cDNA by PCR – primers designed from predicted genes.
Clone and analyse products.
Experimentally confirmed genes raised to > 18,000.
Full length cDNA– valuable for confirming intron/exon structure.
13. Summary of predicted and known gene sequences in C. elegans
1. Predicted 19,500 genes.
2. At least 18,000 expressed as RNA.
3. Average of 1 gene per 5 kb.
4. ~ 42% have detectable homologies to genes/proteins outside Nematoda.
15. The C. elegans Top 20 protein Homologies
Number Description
650 7 TM chemoreceptor
410 Eukaryotic protein kinase domain
240 Zinc finger, C4 (transcription factor)
170 Collagen
140 7 TM receptor
130 Zinc finger, C2H2 (transcription factor)
120 Lectin C-type domain short and long forms
100 RNA recognition motif (RRM, RBD, or RNP domain)
90 Zinc finger, C3HC4 type (transcription factor)
90 Protein-tyrosine phosphatase
90 Ankyrin repeat
90 WD domain, G-beta repeats
80 Homeobox domain (transcription factor)
80 Neurotransmitter-gated ion channel
80 Cytochrome P450
80 Helicases conserved C-terminal domain
80 Alcohol/other dehydrogenases, short-chain type
70 UDP-glucoronosyl and UDP-glucosyl transferases
70 EGF-like domain
70 Immunoglobulin superfamily
16. Does the “Top 20” list tell us anything?
Previous slide looked rather boring?
Test your memory – what was on the list?
Many of the large gene families are implicated in developmental control.
Core set of proteins needed for general cell biology/metabolism to make a cell
– e.g. S. cerevisiae ~6,163 genes.
Evolution of developmental complexity – amplification of families of
regulatory molecules.
The above in part explains the increase in number of genes in multicellular
organisms – it does not explain fully the increase in DNA content.
17. How much does DNA sequence teach us?
Remember that what we can learn from protein similarities
is limited by what we know about the similar proteins.
We still need to connect genes/proteins with functions.
18. How has genomics influenced genetics?
C. elegans mutants
Wild Type
dpy-7: Short fat worm – exoskeletal defect.
ced-4: Programmed cell death defective.
unc-51: Paralysed - abnormal axons.
dec-2: long defecation cycle – genetically constipated.
19. We wanted to investigate the molecular detail of gene defined by mutation.
We knew where mutant genes mapped and we knew their phenotype.
Chromosome I Genetic mapping.
Left arm m.u. bli-3
m.u. = map unit.
-15 egl-30
Genetic mapping – recombination.
mab-20
-10
1 m.u. is 1% recombination per meiosis.
-5 fog-1
unc-73 unc-57
Central 0 dpy-5
dpy-14
cluster fer-1
5 lin-11 unc-29
unc-75 Parent Recombinant
10
unc-101
15
20 glp-4 fog-1 + fog-1 +
25
unc-54 glp-4 + + glp-4
Right arm
20. Sequence of genomes – individual chromosomes
AGCCTTTATGGCGAGATGGATAGCT………………………..………………………………………….TATAA
Physical Map of clones
unc-101
unc-54
unc-75
unc-73
mab-20
lin-11
dpy-5
glp-4
fog-1
egl-30
fer-1
bli-3
Genetic
map
10
15
20
25
0
5
-15
-10
-5
How can the physical and genetic maps be aligned?
Identify the sequence of genes defined by mutation.
21. unc-101
unc-75
unc-54
unc-73
mab-20
lin-11
dpy-5
glp-4
fog-1
egl-30
fer-1
bli-3
Genetic map
10
15
20
25
0
5
-15
-10
-5
Physical map
• An association or alignment between the physical and genetic maps.
22. Positional cloning of genes defined by mutation.
unc-101
unc-54
unc-75
unc-73
mab-20
lin-11
dpy-5
glp-4
fog-1
egl-30
fer-1
bli-3
Genetic map
10
15
20
25
0
5
-15
-10
-5
Physical map
Imagine lin-11 and unc-101 had both been cloned.
Where on the physical map might unc-75 be?
23. Transgenic C.elegans – rescue of mutant phenotype.
DNA injected into the gonads of the adult hermaphrodites.
Form large heritable DNA molecules termed "free arrays".
24. Phenotypic Rescue
1. Inject cosmid into the mutant.
2. Observe transgenic progeny for phenotypic rescue.
3. Subclone individual genes from cosmid.
4. Observe transgenic progeny for phenotypic rescue.
Cosmid sequence
Genes
Inject unc-75 mutant worms.
25. Positional cloning of genes defined by mutation.
unc-101
unc-54
unc-75
unc-73
mab-20
lin-11
dpy-5
glp-4
fog-1
egl-30
fer-1
bli-3
Genetic map
10
15
20
25
0
5
-15
-10
-5
Physical map
Attempt phenotypic rescue with cosmids.
• The standard route to clone C. elegans genes defined by mutation.
• The more genes are cloned the easier it becomes to clone others.
26. Can’t make transgenic humans – but the same positional
information is used to identify Human disease genes.
27. RNA Interference (RNAi)
RNAi - sequence-specific inactivation of gene function by, either by double stranded
RNA or siRNA.
Since its discovery in C.elegans, it has been found to work in many organisms – e.g.
cultured vertebrate cells, plants, trypanosomes, Drosophila.
28. Mediators of RNAi - short interfering RNAs (siRNAs)
21-23 nt dsRNA duplexes.
DICER – Highly conserved family of RNaseIII enzymes.
Targets double stranded RNA.
30. RNAi in C.elegans.
ds RNA
Observer phenotype of F1 offspring
Noticed that site of injection did not matter – intestine works??
How could that affect embryos?
Systemic RNAi
31. Bacterial Feeding Method in C. elegans
Express dsRNA of a cloned C.elegans gene in a strain of E.coli.
Worms eat the bacteria as food.
RNAi of the gene can be obtained both in the worms that feed on the dsRNA
expressing bacteria, and in the F1 progeny of these worms.
32. sid-1 mutants are defective
in systemic RNAi
SID-1 protein
Transport of dsRNA into Cells by
the Transmembrane Protein SID-1
Science 301, 1545 (2003)
33. RNAi as a tool for genetic analysis
Loss of function phenotype can be estimated by RNAi.
RNAi by feeding method – whole genome RNAi projects.
Clones of 16,757 predicted genes tested in genome wide screen.
10.3% gave obvious phenotype.
Redundancy between genes.
RNAi is capable of functioning for more than one gene at a time.
Permits analysis of functionally redundant genes.
34. Summary, C. elegans Genomics
Permits comparisons with human genes.
Most human disease genes have C. elegans homologues.
Powerful genetic tools – experiments on genes.
Detailed anatomy – relate gene to function.
Examples of processes investigated.
Programmed cell death.
Signalling.
Cell adhesion.
Axonal guidance.
Oncogene function.
Insulin Pathway
Ageing
35. How did genes evolve and what are gene/protein families
36. Early genomes
– Early genomes made of RNA
• RNA world - no cells (in modern sense), just RNA, starting with 1
gene
• RNotide polymerase activity - catalyse own synth.
• Later on - translation - encoded info for production of proteins
– Involves nucleic acids ‘coding for’ proteins
– Later emergence of DNA as the info store - genome stability - less
labile
– Modern functions of nucleic acids
• coding - proteins via mRNA
• catalytic – ribozymes
• structural – rRNA, tRNA *
• regulatory - miRNAs
nucleotides
tRNA, rRNA
RNA
DNA
mRNA
Inorganic surface
protein
37. Where did our genome come from?….
‘Tree of Life’
- Tree of all Animals
Common ancestor
=> common genome
*
• Each species’ genome
descended with modification
from genome of ancestor
Reconstruction of picture of ‘ancestral
genome’?
Comparative genomics - tells us about state
of ancestor and changes along each branch
38. Genes and Genome evolution
• What processes lead to genome evolution…?
*
Initial ligation to form early chromosomes
inversion
duplication / deletion
accumn. of point mutations
Invasion - horizontal gene transfer & transposable elements
39. Structure of a typical eukaryotic gene
TSS ATG stop
gene
promoter Intron 1
Exon 1 Exon 2 Exon 3 Exon 4
mRNA Poly A tail
5’-UTR 3’-UTR
protein
Domain 1 Domain 2 *
What features of all genes are missing from this diagram….?