Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Stephen Friend HHMI-Penn 2011-05-27
1. Use of Integrated Genomic Networks to Build
Better Maps of Disease
Stephen Friend MD PhD
Sage Bionetworks (Non-Profit Organization)
Seattle/ Beijing/ San Francisco
University of Pennsylvania
May 27th, 2011
2. why consider the fourth paradigm- data intensive science
thinking beyond the narrative, beyond pathways
advantages of an open innovation compute space
it is more about why than what
3. Alzheimers Diabetes
Treating Symptoms v.s. Modifying Diseases
Depression Cancer
Will it work for me?
4. The Current Pharma Model is Broken:
• In 2010, the pharmaceutical industry spent ~$100B for R&D
• Half of the 2010 R&D spend ($50B) covered pre-PH III activities
• Half of the pre-PH III costs ($25B) were for program targets that at
least one other pharmaceutical company was actively pursuing
• Only 8% of pharma company small molecule PCCs make it to PH III
• In 2010, only 21 new medical entities were approved by FDA
April
16-‐17,
2011
4
San
Francisco
9. “Data Intensive Science” - Fourth Scientific Paradigm
Equipment capable of generating
massive amounts of data
IT Interoperability
Open Information System
Evolving Models hosted in a
Compute Space- Knowledge expert
10.
11.
12.
13. WHY NOT USE
“DATA INTENSIVE” SCIENCE
TO BUILD BETTER DISEASE MAPS?
14. “Data Intensive Science”- “Fourth Scientific Paradigm”
For building: “Better Maps of Human Disease”
Equipment capable of generating
massive amounts of data
IT Interoperability
Open Information System
Evolving Models hosted in a
Compute Space- Knowledge Expert
15.
16. It is now possible to carry out comprehensive
monitoring of many traits at the population level
Monitor disease and molecular traits in
populations
Putative causal gene
Disease trait
17. what will it take to understand disease?
DNA RNA PROTEIN (dark matter)
MOVING BEYOND ALTERED COMPONENT LISTS
19. How is genomic data used to understand biology?
RNA amplification
Tumors
Microarray hybirdization
Tumors
Gene Index
Standard GWAS Approaches Profiling Approaches
Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease
provides NO mechanism Many examples BUT what is cause and effect?
Provide unbiased view of
molecular physiology as it
relates to disease phenotypes
trait
Insights on mechanism
Provide causal relationships
and allows predictions
19
Integrated Genetics Approaches
20. Integration of Genotypic, Gene Expression & Trait Data
Schadt et al. Nature Genetics 37: 710 (2005)
Millstein et al. BMC Genetics 10: 23 (2009)
Causal Inference
“Global Coherent Datasets”
• population based
• 100s-1000s individuals
Chen et al. Nature 452:429 (2008) Zhu et al. Cytogenet Genome Res. 105:363 (2004)
Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)
21. Gene Co-Expression Network Analysis
Define a Gene Co-expression Similarity
Define a Family of Adjacency Functions
Determine the AF Parameters
Define a Measure of Node Distance
Identify Network Modules (Clustering)
Relate the Network Concepts to
External Gene or Sample Information
Zhang B, Horvath S. Stat Appl Genet Mol Biol 2005
22. Constructing Co-expression Networks
Start with expression measures for genes most variant genes across 100s ++ samples
1 2 3 4 Note: NOT a gene
expression heatmap
1
1 0.8 0.2 -0.8
Establish a 2D correlation matrix 2
for all gene pairs
expression
0.8 1 0.1 -0.6
3
0.2 0.1 1 -0.1
4
-0.8 -0.6 -0.1 1
Brain sample
Correlation Matrix
Define Threshold
eg >0.6 for edge
1 2 4 3 1 2 3 4
1 1
1 4 1 1 1 0 1 1 0 1
2 2
1 1 1 0 1 1 0 1
1 1 1 0 Hierarchically 3
Identify modules 4 0 0 1 0
2 3 cluster
4
3 0 0 0 1 1 1 0 1
Network Module Clustered Connection Matrix Connection Matrix
sets of genes for which many
pairs interact (relative to the
total number of pairs in that
set)
23. Preliminary Probabalistic Models- Rosetta /Schadt
Networks facilitate direct identification of
genes that are causal for disease
Evolutionarily tolerated weak spots
Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2
24. Genomic Literature Complexes
Transcriptional
Signaling
The Evolution of Systems Biology
Mol. Profiles Structure
Model Evolution
Disease Models
Physiologic /
Model Topology Pathologic
Phenotype
Model Dynamics Regulation
25. List of Influential Papers in Network Modeling
50 network papers
http://sagebase.org/research/resources.php
27. Recognition that the benefits of bionetwork based molecular
models of diseases are powerful but that they require
significant resources
Appreciation that it will require decades of evolving
representations as real complexity emerges and needs to be
integrated with therapeutic interventions
28. Sage Mission
Sage Bionetworks is a non-profit organization with a vision to
create a commons where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the
elimination of human disease
Building Disease Maps Data Repository
Commons Pilots Discovery Platform
Sagebase.org
29. Engaging Communities of Interest
NEW MAPS
Disease Map and Tool Users-
( Scientists, Industry, Foundations, Regulators...)
PLATFORM
Sage Platform and Infrastructure Builders-
( Academic Biotech and Industry IT Partners...)
RULES AND GOVERNANCE
Data Sharing Barrier Breakers-
(Patients Advocates, Governance
M
and Policy Makers, Funders...)
APS
FOR
M
NEW TOOLS
PLAT
NEW
Data Tool and Disease Map Generators-
(Global coherent data sets, Cytoscape,
RULES GOVERN Clinical Trialists, Industrial Trialists, CROs…)
PILOTS= PROJECTS FOR COMMONS
Data Sharing Commons Pilots-
(Federation, CCSB, Inspire2Live....)
30. Platform Commons Research
Cancer
Neurological Disease
Metabolic Disease
Curation/Annotation
Building
Data Disease
Repository Maps
CTCAP
Public Data Pfizer
Merck Data Outposts Merck
TCGA/ICGC Federation Takeda
CCSB Astra Zeneca
CHDI
Commons Gates
NIH
Pilots
LSDF-WPP
Inspire2Live
Hosting Data POC
Hosting Tools Bayesian Models
Co-expression Models
Hosting Models
Discovery Tools &
Platform Methods
KDA/GSVA
LSDF
30
32. Bin Zhang
Integration of Multiple Networks for Jun Zhu
Pathway and Target Identification
CNV
Data
Gene
Expression
Clinical
Traits
Co-Expression
Bayesian Network
Network
Integration of Coexp. &
Bayesian Networks 32
Key Driver Analysis
32
33. Bin Zhan
Key Driver Analysis Jun Zhu
Justin Guinney
http://sagebase.org/research/tools.php 33
35. Bin Zhang
Model of Breast Cancer: Co-expression Xudong Dai
Jun Zhu
A) Miller 159 samples B) Christos 189 samples
NKI: N Engl J Med. 2002 Dec 19;347(25):1999.
Wang: Lancet. 2005 Feb 19-25;365(9460):671.
Miller: Breast Cancer Res. 2005;7(6):R953.
Christos: J Natl Cancer Inst. 2006 15;98(4):262.
C) NKI 295 samples
E) Super modules
Cell cycle
Pre-mRNA
ECM
D) Wang 286 samples Blood vessel
Immune
response
35
Zhang B et al., Towards a global picture of breast cancer (manuscript).
36. Bin Zhang
Model of Breast Cancer: Integration Xudong Dai
Jun Zhu
Conserved Super-modules
mRNA proc.
= predictive
Breast Cancer Bayesian Network
Chromatin
of survival
Extract gene:gene relationships for selected super-modules from BN and define Key Drivers
Pathways & Regulators
(Key drivers=yellow; key drivers validated in siRNA screen=green)
Cell Cycle (Blue) Chromatin Modification (Black) Pre-mRNA proc. (Brown) mRNA proc. (red)
36
Zhang B et al., Key Driver Analysis in Gene Networks (manuscript)
37. Bin Zhang
Model of Breast Cancer: Mining Xudong Dai
Jun Zhu
Co-expression sub-networks predict survival; KDA identifies drivers
Co-expression Map to Bayesian
Define Key
modules correlate Network
Drivers
with survival
37
38. Bin Zhang
Model of Alzheimer’s Disease Jun Zhu
AD
normal
AD
normal
AD
normal
Cell
cycle
http://sage.fhcrc.org/downloads/downloads.php
39. Liver Cytochrome P450 Regulatory Network Xia Yang
Bin Zhang
Models Jun Zhu
http://sage.fhcrc.org/downloads/downloads.php
Regulators of P450 network
39
Yang et al. Systematic genetic and genomic analysis of cytochrome P450 enzyme activities in human liver. 2010. Genome Research 20:1020.
40. Anders
New Type II Diabetes Disease Models Rosengren
Global expression data 340 genes in islet-specific
from 64 human islet donors open chromatin regions
Blue module: 3000 genes
Associated with
Type 2 diabetes
Elevated HbA1c
Reduced insulin secretion
168 overlapping genes, which have
• Higher connectivity
• Markedly stronger association with
• Type 2 diabetes
• Elevated HbA1c
• Reduced insulin secretion
• Enrichment for beta-cell transcription
factors and exocytotic proteins 40
41. New Type II Diabetes Disease Models Anders
Rosengren
• Search across 1300 datasets in MetaGEO at Sage for similar expression profiles
Top hit: Islet dedifferentiation study where the 168 genes were upregulated in
mature islets and downregulated in dedifferentiated islets (Kutlu et al., Phys Gen 2009)
• Analyses of expression-SNPs and clinical SNPs as well as Causal Inference Test
• Identification of candidate key genes affecting beta-cell differentiation and chromatin
Working hypothesis:
Normal beta-cell: open chromatin in islet-specific regions,
high expression of beta-cell transcription factors,
differentiated beta-cells and normal insulin secretion
Diabetic beta-cell: lower expression of beta-cell transcription
factors affecting the identified module, dedifferentiation,
reduced insulin secretion and hyperglycemia
Next steps: Validation of hypothesis and suggested key genes in human islets 41
42. Brig Mecham
Validating Prostate Cancer Models Xudong Dai
Pete Nelson
Rich Klingoffer
classification
Gene Expression Data on
>1000 prostate cancer
samples
(GEO)
CNV
Gene Expression & CNV Data
Data ~200 prostate cancers Gene
Expression
(Taylor et al)
Clinical
Traits
Co-Expression
Bayesian Network
Network
Integration of Coexp. &
Gene Expression & CNV Bayesian Networks
42
Data ~120 rapid autopsy
Mets
(Nelson)
Key Driver Analysis
Integrated network analysis
Gene Expression & CNV
Data ~30 prostate Key Drivers Matched to
xenografts Xenografts for validations
(Nelson)
with Presage Technology
siRNA Screen Data
(Nelson)
42
43. Lara Mangravite
Systems biology approach to pharmacogenomics
Ron Krauss
Molecular simvastatin response Integrative Ongoing:
Genomic Analysis
Cellular validation of
novel genes and SNPs
involved in statin efficacy
and cellular cholesterol
homeostasis
Clinical simvastatin response
100 -41 %
80
60
40
20
0
-100 -80 -60 -40 -20 0 20
Percent change LDLC
Simon et al, Am J Cardiol 2006
43
44. Clinical Trial Comparator Arm
Partnership (CTCAP)
Description: Collate, Annotate, Curate and Host Clinical Trial Data
with Genomic Information from the Comparator Arms of Industry and
Foundation Sponsored Clinical Trials: Building a Site for Sharing
Data and Models to evolve better Disease Maps.
Public-Private Partnership of leading pharmaceutical companies,
clinical trial groups and researchers.
Neutral Conveners: Sage Bionetworks and Genetic Alliance
[nonprofits].
Initiative to share existing trial data (molecular and clinical) from
non-proprietary comparator and placebo arms to create powerful
new tool for drug development.
45. CTCAP Workstreams
Uncurated GCD
Curated GCD
• Single common identifier to link datatypes
• Gender mismatches removed
Public Sage
Domain
GCDs Curated & QC d GCD Curated GCD
• Gene expression data corrected for batch
Uncurated effects, etc
GCD Curated & QC’d GCD
Collaborators Database
GCDs (Sage) Network Models
• Public
• Collaboration
• Internal
Co-
Private expressio
Domain n Public Databases
GCDs Network dbGAP
Analysis Integrate
d
Network
Bayesian Analysis
Network
Analysis
48. Examples: The Sage Federation
• Founding Lab Groups
– Seattle- Sage Bionetworks
– New York- Columbia: Andrea Califano
– Palo Alto- Stanford: Atul Butte
– San Diego- UCSD: Trey Ideker
– San Francisco: UCSF/Sage: Eric Schadt
• Initial Projects
– Aging
– Diabetes
– Warburg
• Goals: Share all datasets, tools, models
Develop interoperability for human data
49. Human Aging Project
Data Transformations Machine Learning
Brain A
(n=363)
Interactome Elastic Net
Brain B
(n=145)
Brain C TF Activity Profile Age
(n=400) Network Prior Model
Models
Blood A
(n=~1000) Gene Set / Pathway
Variation Analysis
Blood B Tree Classifiers
(n=~1000)
Adipose
(n=~700)
50. Preliminary Results
Adipose Age Prediction
multivariate logistic regression model predicting
age in human adipose data
Master Regulator Analysis
(MARINa)
from Califano's lab.
51. Federation s Genome-wide Network and
Modeling Approach
Califano group at Columbia Sage Bionetworks Butte group at Stanford
53. Genes Associated with Poor Prognosis are disproportionally
found among the networks regulating the glycolysis Genes
P-Value<0.005 Size of the node proportional to -log10 P value for recurrence free survival.
Inferred regulatory module for GGMSE Inferred regulatory module for Oxidative
Phosphorylation and Sphingolipid
>5 fold enrichment of recurrence free prognostic genes with
Metabolism genes
the Glycolysis BN module than random selection (p<1e-100)
60. a
=Sweave Vignette
Sage Lab
R code + PDF(plots + text + code snippets)
narrative
HTML
Data objects
Califano Lab Ideker Lab Submitted
Paper
61. Reproducible science==shareable science
Sweave: combines programmatic analysis with narrative
Dynamic generation of statistical reports
using literate data analysis
Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports
using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –
Proceedings in Computational Statistics,pages 575-580.
Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
79. How free are you to gather all the data you need?
How interoperable and accessible are the tools
to build models of disease?
How fast could we get to cures if patients began
contributing their data while demanding sharing?
How aware do you think patients are of current
reward structures in academia?
“”Ul9mate
excellence
lies
not
in
winning
every
ba@le
but
in
defea9ng
that
which
should
not
be
allowed
without
ever
figh9ng.”
80. why consider the fourth paradigm- data intensive science
thinking beyond the narrative, beyond pathways
advantages of an open innovation compute space
it is more about why than what