2. Lab for Bioinformatics and
computational genomics
10 “genome hackers”
mostly engineers (statistics)
42 scientists
technicians, geneticists, clinicians
>100 people
hardware engineers,
mathematicians, molecular biologists
3. Overview
Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
4. Overview
Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
5.
6.
7.
8.
9.
10. Personalized Medicine
• The use of diagnostic tests (aka biomarkers) to identify in advance
which patients are likely to respond well to a therapy
• The benefits of this approach are to
– avoid adverse drug reactions
– improve efficacy
– adjust the dose to suit the patient
– differentiate a product in a competitive market
– meet future legal or regulatory requirements
• Potential uses of biomarkers
– Risk assessment
– Initial/early detection
– Prognosis
– Prediction/therapy selection
– Response assessment
– Monitoring for recurrence
11. Biomarker
First used in 1971 … An objective and
« predictive » measure … at the molecular
level … of normal and pathogenic processes
and responses to therapeutic interventions
Characteristic that is objectively measured and
evaluated as an indicator of normal biologic
or pathogenic processes or pharmacologic
response to a drug
A biomarker is valid if:
– It can be measured in a test system with well
established performance characteristics
– Evidence for its clinical significance has been
established
12. Rationale 1:
Why now ? Regulatory path becoming more clear
There is more at stake than
efficient drug
development. FDA
« critical path initiative »
Pharmacogenomics
guideline
Biomarkers are the
foundation of « evidence
based medicine » - who
should be treated, how
and with what.
Without Biomarkers
advances in targeted
therapy will be limited and
treatment remain largely
emperical. It is imperative
that Biomarker
development be
accelarated along with
therapeutics
13. Why now ?
First and maturing second generation molecular
profiling methodologies allow to stratify clinical
trial participants to include those most likely to
benefit from the drug candidate—and exclude
those who likely will not—pharmacogenomicsbased
Clinical trials should attain more specific results
with smaller numbers of patients. Smaller
numbers mean fewer costs (factor 2-10)
An additional benefit for trial participants and
internal review boards (IRBs) is that
stratification, given the correct biomarker, may
reduce or eliminate adverse events.
14. Molecular Profiling
The study of specific patterns (fingerprints) of proteins,
DNA, and/or mRNA and how these patterns correlate
with an individual's physical characteristics or
symptoms of disease.
15. Generic Health advice
• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolarance)
• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
16. Generic Health advice (UNLESS)
• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolarance)
• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
17. Generic Health advice (UNLESS)
• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolerance)
• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
18. Generic Health advice (UNLESS)
• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolerance)
• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
28. First Generation Molecular Profiling
• Flow cytometry correlates surface markers,
cell size and other parameters
• Circulating tumor cell assays (CTC’s)
quantitate the number of tumor cells in the
peripheral blood.
• Exosomes are 30-90 nm vesicles secreted by
a wide range of mammalian cell types.
• Immunohistochemistry (IHC) measures
protein expression, usually on the cell
surface.
29.
30.
31.
32. First Generation Molecular Profiling
• Gene sequencing for mutation detection
• Microarray for m-RNA message detection
• RT-PCR for gene expression
• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
gene copy number
33. Basics of the ―old‖ technology
• Clone the DNA.
• Generate a ladder of labeled (colored)
molecules that are different by 1 nucleotide.
• Separate mixture on some matrix.
• Detect fluorochrome by laser.
• Interpret peaks as string of DNA.
• Strings are 500 to 1,000 letters long
• 1 machine generates 57,000 nucleotides/run
• Assemble all strings into a genome.
37. First Generation Molecular Profiling
• Gene sequencing for mutation detection
• Microarray for m-RNA message detection
• RT-PCR for gene expression
• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
gene copy number
39. First Generation Molecular Profiling
• Gene sequencing for mutation detection
• Microarray for m-RNA message detection
• RT-PCR for gene expression
• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
gene copy number
40.
41. Overview
Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
42. Basics of the ―new‖ technology
• Get DNA.
• Attach it to something.
• Extend and amplify signal with some color
scheme.
• Detect fluorochrome by microscopy.
• Interpret series of spots as short strings of
DNA.
• Strings are 30-300 letters long
• Multiple images are interpreted as 0.4 to 1.2
GB/run (1,200,000,000 letters/day).
• Map or align strings to one or many genome.
49. % of Paired K-mers with Uniquely
Assignable Location
Read Length is Not As Important For Resequencing
100%
90%
80%
70%
60%
E.COLI
50%
HUMAN
40%
30%
20%
10%
0%
8
Jay Shendure
10
12
14 16
18
20
Length of K-mer Reads (bp)
59. Second Generation DNA profiling
• Enrichment Sequencing
• ChIP-Seq (Chromosome
Immunoprecipitation)
• A substitute for ChIP-chip
• Eg. to find the binding sequence of
proteins (TFBS)
60. Paired End Reads are Important!
Known Distance
Repetitive DNA
Read 1Unique DNA 2
Read
Single read maps to
multiple positions
61. Paired End Reads are Important!
Known Distance
Repetitive DNA
Read 1Unique DNA 2
Read
Single read maps to
multiple positions
62. Second Generation DNA profiling
• Exome Sequencing (aka known as
targeted exome capture) is an
efficient strategy to selectively
sequence the coding regions of the
genome to identify novel genes
associated with rare and common
disorders.
• 160K exons
67. Second Generation RNA profiling
Besides the 6000 protein coding-genes …
140 ribosomal RNA genes
275 transfer RNA gnes
40 small nuclear RNA genes
>100 small nucleolar genes
Contents-Schedule
Function of RNA genes
pRNA in 29 rotary packaging motor (Simpson
et el. Nature 408:745-750,2000)
Cartilage-hair hypoplasmia mapped to an RNA
(Ridanpoa et al. Cell 104:195-203,2001)
The human Prader-Willi ciritical region (Cavaille
et al. PNAS 97:14035-7, 2000)
68. Second Generation RNA profiling
RNA genes can be hard to detects
UGAGGUAGUAGGUUGUAUAGU
C.elegans let-27; 21 nt
(Pasquinelli et al. Nature 408:86-89,2000)
Often small
Sometimes multicopy and redundant
Often not polyadenylated
(not represented in ESTs)
Immune to frameshift and nonsense
mutations
No open reading frame, no codon bias
Often evolving rapidly in primary sequence
69. Second Generation RNA profiling
Although details of the methods vary, the concept
behind RNA-seq is simple:
• isolate all mRNA
• convert to cDNA using reverse transcriptase
• sequence the cDNA
• map sequences to the genome
The more times a given sequence is detected, the
more abundantly transcribed it is. If enough
sequences are generated, a comprehensive and
quantitative view of the entire transcriptome of an
organism or tissue can be obtained.
70. Second Generation RNA profiling
• Comparing to microarray
– Microarray
• Closed technology: Prior knowledge required
• Affected by pseudo-genes (homologous of real genes)
• Low sensitivity
– RNA-Seq
• Open technology: No prior knowledge required
• Not affected by pseudo-genes because exact
sequence is measured
• Other information could be yielded (SNP, Alternative
splicing)
73. Mapping Structural Variation in Humans
>1 kb segments
- Thought to be Common
12% of the genome
(Redon et al. 2006)
- Likely involved in phenotype
variation and disease
CNVs
- Until recently most methods for
detection were low resolution
(>50 kb)
81. Second Generation Protein profiling
• Proteomics MS-MS-based
exclusively in discovery mode
• Automate diagnostics assay
generation (next generation
proteomics)
• Aptamers as alternative to antibodies
• ImmunoPCR
83. Second Generation Protein profiling
• Proteomics MS-MS-based
exclusively in discovery mode
• Automate diagnostics assay
generation (next generation
proteomics)
• Aptamers as alternative to antibodies
• ImmunoPCR
84. Overview
Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
85. Genome-wide methylation
…. by next generation sequencing
# markers
3 000 000
MethylCap_Seq
6 000
EpiHealth
50
Deep_Seq
5
Discovery
<50
only models
and fresh frozen
Verification
Validation
> 50
# samples
CONFIDENTIAL
90. Bioinformatics, a life science discipline …
Math
Computer Science
Theoretical Biology
Informatics
Computational Biology
(Molecular)
Biology
91. Bioinformatics, a life science discipline …
Math
Theoretical Biology
Computer Science
Bioinformatics
Informatics
Computational Biology
(Molecular)
Biology
92. Bioinformatics, a life science discipline … management of expectations
Math
Theoretical Biology
Computer Science
NP
Datamining
AI, Image Analysis
structure prediction (HTX)
Bioinformatics
Interface Design
Expert Annotation
Sequence Analysis
Informatics
Computational Biology
(Molecular)
Biology
93. Bioinformatics, a life science discipline … management of expectations
Math
Theoretical Biology
Computer Science
NP
Datamining
AI, Image Analysis
structure prediction (HTX)
Bioinformatics
Discovery Informatics – Computational Genomics
Interface Design
Expert Annotation
Sequence Analysis
Informatics
Computational Biology
(Molecular)
Biology
94. Translational Medicine: An inconvenient truth
• 1% of genome codes for proteins, however
more than 90% is transcribed
• Less than 10% of protein experimentally
measured can be ―explained‖ from the
genome
• 1 genome ? Structural variation
• > 200 Epigenomes ??
• Space/time continuum …
95. Translational Medicine: An inconvenient truth
• 1% of genome codes for proteins, however
more than 90% is transcribed
• Less than 10% of protein experimentally
measured can be ―explained‖ from the
genome
• 1 genome ? Structural variation
• > 200 Epigenomes …
• ―space/time‖ continuum
101. Wobblebase Mission
provide tools to both specialists (researchers,
bioinformaticians, health care providers) and
individual consumers that unlock the power of
genomic data to the USER
enable personalized genomics today by simplifying
the way we organize, visualize and manage
genomic data.
102. PGM: Personal Genomics Manifesto
Everybody who wants to get his genome sequenced has the human right to do so.
No third party can own your genetic data, your genetic data is exclusively yours.
Nobody can be forced to get his genome analyzed or to reveal his genome to a
third party.
Your genome should allways be treated as confidential, private information.
People should be advised not to share their identity AND their entire genome on a
public forum.
People should be advised to use secure technologies that allow to maximally
protect phenotypic and/or genotype data.
People should be able to actively explore, manage and get updated interpretation
on their genomic data.
104. Choosing the Red Pill
The Technical Feasibility Argument
The Quality Argument
The Price Argument
The Logistics around the sample on howto
manage the data Argument
The Ethical debate
The Privacy/Security concern