Human genetic variation and its contribution to complex traits
CSHL
1. A
C
G
T
Comparative Analysis of HumanComparative Analysis of Human
Chromosome 22q11.1-q12.3 withChromosome 22q11.1-q12.3 with
Syntenic Regions in the Chimpanzee,Syntenic Regions in the Chimpanzee,
Baboon, Bovine, Mouse, Pufferfish andBaboon, Bovine, Mouse, Pufferfish and
Zebrafish GenomesZebrafish Genomes
Dr. Bruce A. RoeDr. Bruce A. Roe
George Lynn Cross Research ProfessorGeorge Lynn Cross Research Professor
Advanced Center for Genome TechnologyAdvanced Center for Genome Technology
Department of Chemistry and BiochemistryDepartment of Chemistry and Biochemistry
University of OklahomaUniversity of Oklahoma
broe@ou.edu www.genome.ou.edubroe@ou.edu www.genome.ou.edu
LXVIII CSHL Symposium
“The Genome of Homo Sapiens”
May 28 - June 3, 2003
2. A
C
G
T
““The joy of science is the peopleThe joy of science is the people
you meet along the way and howyou meet along the way and how
they influence your life”they influence your life”
Jochanan Stenesh and Lilian Myers at Western Michigan University
and Bernie Dudock at SUNY Stony Brook
Bart Barrell and Alan Coulson
originally at the MRC-Hills
Road Cambridge and Ian
Dunham both now at the
Sanger Institute
Watson and Crick
Fred Sanger
Bev Emanuel at Childrens
Hospital of Philadelphia
4. A
C
G
T
Human Chromosome 22Human Chromosome 22
Sequence FeaturesSequence Features
• 39 % of the sequence is occupied by genes including39 % of the sequence is occupied by genes including
their introns, 5’ and 3’ non-translated regions.their introns, 5’ and 3’ non-translated regions.
• 3 % of the complete sequence encodes the protein3 % of the complete sequence encodes the protein
products of these genes.products of these genes.
• 42 % of the sequence is composed of repetitive42 % of the sequence is composed of repetitive
sequences, compared to 46 % for the entire genome.sequences, compared to 46 % for the entire genome.
• Only slightly over half of the genes predicted forOnly slightly over half of the genes predicted for
human chromosome 22 can be experimentallyhuman chromosome 22 can be experimentally
validated.*validated.*
* Shoemaker DD., et al. Experimental annotation of the human
genome using microarray technology. Nature. 409, 922-7 (2001).
5. A
C
G
T
An Individual’s Genome
Differs from the DNA of:
• Siblings by 1 to 2 million bases, ~99.98% identical, with
coding regions 99.99999% identical
• Unrelated humans by 6 million bases, ~99.8% identical
overall, with coding regions 99.9999% identical
• Chimpanzees by about 100 million base pairs ~98%
identical
• Baboons by about 300 million base pairs ~92% identical
• Mice by about 2.8 billion bases, but coding regions are
~90% identical
• Leaf spinach by about 2.9 billion bases, but coding
regions are ~40% identical
7. A
C
G
T
Human Chromosome 22
Single Nucleotide Polymorphisms*
Number of overlaps 335
Size of overlaps 13,203,147 bp
Number of SNPs 11,116 (~1/1000 bp)
Number of substitutions 9,123 (82%)
Number of ins/del 1,193 (18%)
Only 48 of the 11,116 SNPs were in coding
regions ~ 10 fold lower than in non-coding
* E. Dawson, et al. A SNP Resource For Human Chromosome 22: Extracting Dense
Clusters of SNPs from the Genomic Sequence. Genome Research, 11, 170-178 (2001).
8. A
C
G
T
““We each are like a different symphony orchestra”We each are like a different symphony orchestra”
““All playing the same instruments slightly differently”All playing the same instruments slightly differently”
9. A
C
G
T
Good news and Bad newsGood news and Bad news
• Good news <40,000 genes (counting dark space?)Good news <40,000 genes (counting dark space?)
• Bad newsBad news
• 2-4 times as many proteins as other2-4 times as many proteins as other
species due to extensive alternativespecies due to extensive alternative
splicing in humans.splicing in humans.
• We only know the function of aboutWe only know the function of about
half the predicted genes.half the predicted genes.
• Likely > 1 million different geneLikely > 1 million different gene
products based on alternative splicingproducts based on alternative splicing
and post-translational modifications.and post-translational modifications.
10. A
C
G
T
Where we stand now
• We essentially have the ‘dictionary’ with allWe essentially have the ‘dictionary’ with all
the words (genes) spelled correctly, but onlythe words (genes) spelled correctly, but only
slightly more than half of the words (genes)slightly more than half of the words (genes)
have definitions.have definitions.
• Through comparative genomic sequencingThrough comparative genomic sequencing
we can annotate the human genome basedwe can annotate the human genome based
on evolutionary conserved gene sequenceson evolutionary conserved gene sequences
and use model systems to study geneand use model systems to study gene
expression.expression.
• Slightly over half of the genes predicted forSlightly over half of the genes predicted for
human chromosome 22 have beenhuman chromosome 22 have been
experimentally validated.experimentally validated.
12. A
C
G
T
Chimpanzee and Baboon
Genomic Sequencing
• Medically important model eukaryotic organisms
• The chimpanzee is our nearest evolutionary
relative with a genome that has ~98 %
sequence identity with the human genome
• The baboon genome has ~92 % sequence
identity with the human genome
13. A
C
G
T
PIP Plot of
a region of
human
chr22
compared
to syntenic
regions of
baboon
and mouse
human-
specific
repeat
regions
Questionable
gene present
in primates
but not in
rodents
14. A
C
G
T
Variations in the regions syntenic to the
human chr 22 immunoglobulin light chain
region from chimp, baboon, rat and mouse
18. A
C
G
T
Conclusions from the analysis of
vertebrate genomic sequences
• Approximately 40% of the genome is expressed into
hnRNA which is processed to 10-fold smaller mature
mRNA with extensive alternative splicing (1 gene -->
multiple proteins).
• Approximately 40% repeat sequence density.
• Conserved coding sequences, promoters and enhancers
and exon spacing approximately proportional to
evolutionary distance from a common ancestor.
• Additional endogenous retroviral and alu sequences in the
human genome and some regions not present is different
vertebrates.
• Sequence drift in duplicated gene families.
• About half of the predicted genes have yet to be assigned
any known function.
19. A
C
G
T
“Zebrafish are small people that swim
in the water and breathe through gills”
Han Wang, Dept. Zoology and Director of the
University of Oklahoma Zebrafish Facility
20. A
C
G
T
How much of the ~1.7 Gbp genome has been sequenced so far?
The whole genome shotgun project comprises roughly 11.6 million traces by
now. With an average quality clipped trace length of 517 bp this adds to 6 Gb in
total, so the genome is covered 3.5 times.
The new assembly Zv2 is built on 11.7 million traces with an average trace
length of 651 bp length, adding up to 7.64 Gbp (4.5 x coverage).
The current Sanger Institute in-house statistics for the clone sequencing are:
* 322,712,747 bp unfinished
* 112,494,895 bp finished
* 435,207,642 bp total
21. A
C
G
T
Individuals within a single developing clutch
hatch sporadically during the whole period.
Hatching Period (48-72 h)
Embryos developing to the phyolotypic stage
when it posesses the classic vertebrate
bauplan.Migration of the posterior lateral line
primordium. Rapid organogenesis continues.
Pharyngula Period (24-48 h)
Somites develop, the rudiments of the primary
organs become visible, the tail bud becomes
more prominent and the embryo elongates. The
first cells differentiate morphologically, and the
first body movements appear.
Segmentation Period (10 1/3 - 24 h)
Morphogenetic cell movements of involution,
convergence, and extension occur, producing
the primary germ layers and the embryonic axis.
Gastrula Period (5 1/4 - 10 1/3h)
Begins at 128-cell stage or 8th zygotic cell cycle.
Embryo enters midblastula transition (MBT), the
onset of zygotic transcription. Period ends at the
onset ofgastrulation.
Blastula Period (2 1/4 - 5 1/4 h)
After the first cleavage, blastomeres divide at
approximately 15 minute intervals
Cleavage Period (0.7- 2.2 h)
The newly fertilized egg is in the zygote period
until the first cleavage occurs
Zygote Period (0-3/4 h)
DescriptionZebrafish Developmental stages(HPF*)
Kimmel CB, et al. Stages of embryonic development of the zebrafish. Dev Dyn 203, 253-310 (1995).
22. A
C
G
T
• Created and sequenced 10,000 clones from a zebrafish brain
and eye cDNA library.
• After a blast vs human chromosome 22, obtained the set of
zebrafish cDNA clones corresponding to several predicted
human chromosome 22 genes.
• Picked an EST whose expression profile matched a hypothetical
protein with and EST from a human fetal brain library.
Gene Expression in Zebrafish
23. A
C
G
T
Gene Expression in Zebrafish (cont)
• An antisense RNA hybridization probe was generated by in vitro
transcription in the presence of dig-UTP after cloning into an
expression vector.
• Whole mount in situ hybridization was to 24, 48, and 72 hours post-
fertilization zebrafish embryos.
• Hybridization was detected by anti-dig antibody.
1b6: AP000557.1.mRNA chr22 position:18495442-18504448 KIAA1020 hypothetical protein matches EST b6n20zf
24hpf 48hpf 72hpf
Probe1 b6
Probe1 b6 shows hybridization in the brain from 24 hours onward and in the eye
from 48 hours onward.
24. A
C
G
T
Exon-specific gene expression in zebra fish
embryos during development that is
amenable to automation
Incorporated mouse in situ methods for zebrafish that:
• shorten the length of probes from 1000 bp to 100 bp, thus
exon-specific probes,
• hybridizations in a 96 well multiplex microtiter plate format,
• digoxigenin labeled ssDNA probes generated from
assymetric, single primer amplification off PCR (eliminating
sub-cloning of each PCR product into T3/T7 expression
vectors), and
• eliminated the spurious labeling of the eye by introducing
glycine as the reagent of choice to rapidly inhibit the
proteinase K used to increase permeability of the embryos.
25. A
C
G
T
QuickTime™ and a Graphics decompressor are needed to see this picture.
QuickTime™ and a Graphics decompressor are needed to see this picture.
Whole mount in situ hybridization with
ssDNA-digoxigenin labeled probe
made from a PCR product. Brain-
specific expression of this mRNA
during embryonic development
26. A
C
G
T
Anti-sense probe Sense probe No probe
Typically only see anti-sense probe hybridizing,
and therefore stained by anti-dig antibody with
some probe-independent staining in the eye.
The importance of a “no probe” antibody staining
control to determine if any probe-independent
antibody staining occurs in the lens
72 hour post fertilization embryo
27. A
C
G
T
A probe to the unique 3’ UTR if
there are multiple paralogs
One last experiment with a surprise ending
29. A
C
G
T
Anti-sense probe Sense probe No probe
Both the anti-sense and sense probes hybridized
to 72 hour post fertilization embryonic brain.
Indicating RNA transcribed from
the opposite, non-coding strand?
One too many controls sometimes
results in a surprise observation
30. A
C
G
T
What’s next for our Genome Center?
• Participate in sequencing the mouse, chimp, baboon,Participate in sequencing the mouse, chimp, baboon,
lemur, bovine, dog, cat, chicken and zebra fishlemur, bovine, dog, cat, chicken and zebra fish
genomes concentrating on:genomes concentrating on:
• Regions of high biological interest andRegions of high biological interest and
• Regions orthologous to human chromosome 22Regions orthologous to human chromosome 22
• Sequence theSequence the Medicago truncatulaMedicago truncatula (alfalfa) genome(alfalfa) genome
using a mapped BAC-based approach concentratingusing a mapped BAC-based approach concentrating
on coding regionson coding regions
• Continued sequencing of selected pathogenic bacteriaContinued sequencing of selected pathogenic bacteria
• Investigate the function of the predicted genes withInvestigate the function of the predicted genes with
unknown function in the zebrafish system first byunknown function in the zebrafish system first by
whole mountwhole mount in situin situ and then expression knock downand then expression knock down
experiments with morpholino oligos.experiments with morpholino oligos.
31. A
C
G
T
Laboratory OrganizationLaboratory Organization
Bruce Roe, PIBruce Roe, PI
InformaticsInformatics
Support TeamsSupport Teams
ProductionProduction AdministrationAdministration
Jim WhiteJim White
Steve KentonSteve Kenton
Hongshing LaiHongshing Lai
Sean Qian***Sean Qian***
Rose Morales-Diaz*Rose Morales-Diaz*
Mounir Elharam*Mounir Elharam*
Steve Shaull**Steve Shaull**
Doug WhiteDoug White
Work-study Undergraduate students**Work-study Undergraduate students**
KayLynn HaleKayLynn Hale
Dixie WishnuckDixie Wishnuck
Tami WomackTami Womack
Mary Catherine WilliamsMary Catherine Williams
DNA SynthesisDNA Synthesis
Phoebe Loh*Phoebe Loh*
Sulan QiSulan Qi
Bart Ford*Bart Ford*
Reagents &Reagents &
Equip. Maint.Equip. Maint.
Mounir Elharam*Mounir Elharam*
Doug WhiteDoug White
Clayton Powell**Clayton Powell**
Axin Hua***Axin Hua***
Weihong Xu****Weihong Xu****
Yanhong LiYanhong Li
Jami Milam****Jami Milam****
Sara Downard**Sara Downard**
Ging Sobhraksha**Ging Sobhraksha**
Limei YangLimei Yang
Angie Prescott*Angie Prescott*
Audra Wendt**Audra Wendt**
Mandi Aycock**Mandi Aycock**
Ziyun Yao***Ziyun Yao***
Steve Shaull*Steve Shaull*
Youngju Yoon****Youngju Yoon****
Trang DoTrang Do
Anh DoAnh Do
Lily FuLily Fu
Yang Ye**Yang Ye**
Tessa Manning**Tessa Manning**
Fu YingFu Ying
Liping ZhouLiping Zhou
Ruihua Shi****Ruihua Shi****
Junjie Wu****Junjie Wu****
Stephan Deschamps***Stephan Deschamps***
Shelly Oommen****Shelly Oommen****
Christopher Lau****Christopher Lau****
Research TeamsResearch Teams
Doris KupferDoris Kupfer
Julia Kim*Julia Kim*
Sun SoSun So
Graham Wiley**Graham Wiley**
Lin Song****Lin Song****
Ying NiYing Ni
Huarong JiangHuarong Jiang
ShaoPing Lin***ShaoPing Lin***
Honggui JiaHonggui Jia
Hongming WuHongming Wu
Baifang QinBaifang Qin
Peng ZhangPeng Zhang
Shuling LiShuling Li
Fares Najar***Fares Najar***
Chunmei QuChunmei Qu
Keqin WangKeqin Wang
Funding from the NHGRI, Noble Foundation, DOE, NSF (pending)
- Collaborators at Sanger, CWRU, CHOP, Keio, UIUC and Riken
Pheobe LohPheobe Loh **
Sulan QiSulan Qi
Bart Ford*Bart Ford*
* Previous undergraduate res. student* Previous undergraduate res. student
** Present undergraduate res. student** Present undergraduate res. student
*** Previous graduate student*** Previous graduate student
**** Present graduate student**** Present graduate student