The document discusses using "data intensive science" and network models to better understand human disease. It describes how large datasets from equipment that can generate massive amounts of data, combined with open information systems and evolving computational models, can be used to build better maps of human disease. This "fourth paradigm" of data-driven science is presented as an advantage over traditional reductionist approaches for accelerating disease elimination through open innovation.
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Stephen Friend National Heart Lung & Blood Institute 2011-07-19
1. From Gene networks to bioinformatics networks
Stephen Friend MD PhD
Sage Bionetworks (Non-Profit Organization)
Seattle/ Beijing/ San Francisco
NHLBI
July 18th, 2011
2. why consider the fourth paradigm- data intensive science
thinking beyond the narrative, beyond pathways
advantages of an open innovation compute space
it is more about how than what
3. COPD
Diabetes
Treating Symptoms v.s. Modifying Diseases
Pulmonary
Fibrosis
Obesity
Will it work for me?
8. WHY
NOT
USE
“DATA
INTENSIVE”
SCIENCE
TO
BUILD
BETTER
DISEASE
MAPS?
9. “Data Intensive Science”- “Fourth Scientific Paradigm”
For building: “Better Maps of Human Disease”
Equipment capable of generating
massive amounts of data
IT Interoperability
Open Information System
Evolving Models hosted in a
Compute Space- Knowledge Expert
10. It is now possible to carry out comprehensive
monitoring of many traits at the population level
Monitor disease and molecular traits in
populations
Putative causal gene
Disease trait
11. what will it take to understand disease?
DNA
RNA
PROTEIN
(dark
maKer)
MOVING
BEYOND
ALTERED
COMPONENT
LISTS
13. How is genomic data used to understand biology?
RNA amplification
Tumors
Microarray hybirdization
Tumors
Gene Index
!Standard"GWAS Approaches Profiling Approaches
Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease
provides NO mechanism Many examples BUT what is cause and effect?
Provide unbiased view of
molecular physiology as it
relates to disease phenotypes
trait
Insights on mechanism
Provide causal relationships
and allows predictions
Integrated"
! Genetics Approaches
14. Integration of Genotypic, Gene Expression & Trait Data
Schadt et al. Nature Genetics 37: 710 (2005)
Millstein et al. BMC Genetics 10: 23 (2009)
Causal Inference
“Global Coherent Datasets”
• population based
• 100s-1000s individuals
Chen et al. Nature 452:429 (2008) Zhu et al. Cytogenet Genome Res. 105:363 (2004)
Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)
15. Constructing Co-expression Networks
Start with expression measures for genes most variant genes across 100s ++ samples
1 2 3 4 Note: NOT a gene
expression heatmap
1
1 0.8 0.2 -0.8
Establish a 2D correlation matrix 2
for all gene pairs
expression
0.8 1 0.1 -0.6
3
0.2 0.1 1 -0.1
4
-0.8 -0.6 -0.1 1
Brain sample
Correlation Matrix
Define Threshold
eg >0.6 for edge
1 2 4 3 1 2 3 4
1 1
1 4 1 1 1 0 1 1 0 1
2 2
1 1 1 0 1 1 0 1
1 1 1 0 Hierarchically 3
Identify modules 4 0 0 1 0
2 3 cluster
4
3 0 0 0 1 1 1 0 1
Network Module Clustered Connection Matrix Connection Matrix
sets of genes for which many
pairs interact (relative to the
total number of pairs in that
set)
16. Preliminary Probabalistic Models- Rosetta /Schadt
Networks facilitate direct
identification of genes that are
causal for disease
Evolutionarily tolerated weak spots
Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2
17. List of Influential Papers in Network Modeling
50 network papers
http://sagebase.org/research/resources.php
19. Recognition that the benefits of bionetwork based molecular
models of diseases are powerful but that they require
significant resources
Appreciation that it will require decades of evolving
representations as real complexity emerges and needs to be
integrated with therapeutic interventions
20. Sage Mission
Sage Bionetworks is a non-profit organization with a vision to
create a commons where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the
elimination of human disease
Building Disease Maps Data Repository
Commons Pilots Discovery Platform
Sagebase.org
22. Engaging Communities of Interest
NEW MAPS
Disease Map and Tool Users-
( Scientists, Industry, Foundations, Regulators...)
PLATFORM
Sage Platform and Infrastructure Builders-
( Academic Biotech and Industry IT Partners...)
RULES AND GOVERNANCE
Data Sharing Barrier Breakers-
(Patients Advocates, Governance
M
and Policy Makers, Funders...)
APS
FOR
M
NEW TOOLS
PLAT
NEW
Data Tool and Disease Map Generators-
(Global coherent data sets, Cytoscape,
RULES GOVERN Clinical Trialists, Industrial Trialists, CROs…)
PILOTS= PROJECTS FOR COMMONS
Data Sharing Commons Pilots-
(Federation, CCSB, Inspire2Live....)
23. Platform Commons Research
Cancer
Neurological Disease
Metabolic Disease
Curation/Annotation
Building
Data Disease
Repository Maps
CTCAP
Public Data Pfizer
Merck Data Outposts Merck
TCGA/ICGC Federation Takeda
CCSB Astra Zeneca
CHDI
Commons Gates
NIH
Pilots
LSDF-WPP
Inspire2Live
Hosting Data POC
Hosting Tools Bayesian Models
Co-expression Models
Hosting Models
Discovery Tools &
Platform Methods
KDA/GSVA
LSDF
24. Bin Zhang
Model of Breast Cancer: Co-expression Xudong Dai
Jun Zhu
A) Miller 159 samples B) Christos 189 samples
NKI: N Engl J Med. 2002 Dec 19;347(25):1999.
Wang: Lancet. 2005 Feb 19-25;365(9460):671.
Miller: Breast Cancer Res. 2005;7(6):R953.
Christos: J Natl Cancer Inst. 2006 15;98(4):262.
C) NKI 295 samples
E) Super modules
Cell cycle
Pre-mRNA
ECM
D) Wang 286 samples Blood vessel
Immune
response
Zhang B et al., Towards a global picture of breast cancer (manuscript).
25. Bin Zhang
Model of Alzheimer’s Disease Jun Zhu
AD
normal
AD
normal
AD
normal
Cell
cycle
http://sage.fhcrc.org/downloads/downloads.php
26. Anders
New Type II Diabetes Disease Models Rosengren
Global expression data
340 genes in islet-specific
from 64 human islet donors
open chromatin regions
Blue module: 3000 genes
Associated with
Type 2 diabetes
Elevated HbA1c
Reduced insulin secretion
168 overlapping genes, which have
• Higher connectivity
• Markedly stronger association with
• Type 2 diabetes
• Elevated HbA1c
• Reduced insulin secretion
• Enrichment for beta-cell transcription
factors and exocytotic proteins
27. New Type II Diabetes Disease Models Anders
Rosengren
• Search across 1300 datasets in MetaGEO at Sage for similar expression profiles
Top hit: Islet dedifferentiation study where the 168 genes were upregulated in
mature islets and downregulated in dedifferentiated islets (Kutlu et al., Phys Gen 2009)
• Analyses of expression-SNPs and clinical SNPs as well as Causal Inference Test
• Identification of candidate key genes affecting beta-cell differentiation and chromatin
Working hypothesis:
Normal beta-cell: open chromatin in islet-specific regions,
high expression of beta-cell transcription factors,
differentiated beta-cells and normal insulin secretion
Diabetic beta-cell: lower expression of beta-cell transcription
factors affecting the identified module, dedifferentiation,
reduced insulin secretion and hyperglycemia
Next steps: Validation of hypothesis and suggested key genes in human islets
28. Liver Cytochrome P450 Regulatory Network Xia Yang
Bin Zhang
Models Jun Zhu
http://sage.fhcrc.org/downloads/downloads.php
Regulators of P450 network
Yang et al. Systematic genetic and genomic analysis of cytochrome P450 enzyme activities in human liver. 2010. Genome Research 20:1020.
29. Clinical Trial Comparator Arm
Partnership (CTCAP)
Description: Collate, Annotate, Curate and Host Clinical Trial Data
with Genomic Information from the Comparator Arms of Industry and
Foundation Sponsored Clinical Trials: Building a Site for Sharing
Data and Models to evolve better Disease Maps.
Public-Private Partnership of leading pharmaceutical companies,
clinical trial groups and researchers.
Neutral Conveners: Sage Bionetworks and Genetic Alliance
[nonprofits].
Initiative to share existing trial data (molecular and clinical) from
non-proprietary comparator and placebo arms to create powerful
new tool for drug development.
30. Examples: The Sage Federation
• Founding Lab Groups
– Seattle- Sage Bionetworks
– New York- Columbia: Andrea Califano
– Palo Alto- Stanford: Atul Butte
– San Diego- UCSD: Trey Ideker
– San Francisco: UCSF/Sage: Eric Schadt
• Initial Projects
– Aging
– Diabetes
– Warburg
• Goals: Share all datasets, tools, models
Develop interoperability for human data
31. Federation s Genome-wide Network and
Modeling Approach
Califano group at Columbia Sage Bionetworks Butte group at Stanford
32. Human Aging Project
Data Transformations Machine Learning
Brain A
(n=363)
Interactome Elastic Net
Brain B
(n=145)
Brain C TF Activity Profile Age
(n=400) Network Prior Model
Models
Blood A
(n=~1000) Gene Set / Pathway
Variation Analysis
Blood B Tree Classifiers
(n=~1000)
Adipose
(n=~700)
35. … the world is becoming too
fast, too complex, and too networked
for any company to have
all the answers inside
Y. Benkler, The Wealth of Networks
36. Is the Industry managing itself into irrelevance?
$130 billion of patented drug
sales will face generics in the
2011-2016 decade (55% of
2009 US sales)
Sales exposed to generics
will double in 2012 (to $33
billion)
98% of big pharma sales
come from products 5 years
and older (avg patent life =
11 years)
6 big pharmas were lost in
the last 10 years
37. Largest Attrition For Pioneer Targets is at
Clinical POC (Ph II)
Target ID/ Hit/Probe/ Clinical Toxicolog Phase I
Phase
Discovery Lead ID Candidate y/ IIa/IIb
ID Pharmaco
logy
Attrition 50% 10% 30% 30% 90%
This is killing drug discovery
We can generate effective and safe molecules in animals, but
they do not have sufficient efficacy and/or safety in the chosen
patient group.
38. The current pharma model is redundant
Target ID/ Hit/Probe/ Clinical Toxicolog Phase I
Phase
Discovery Lead ID Candidate y/ IIa/IIb
Phase
Target ID/ Hit/Probe/ Clinical
ID Pharmaco
Toxicolog Phase I
Discovery Lead ID Candidate logy
y/ IIa/IIb
ID Pharmaco
Target ID/ Hit/Probe/ Clinical
logy
Toxicolog Phase I
Phase
Discovery Lead ID Candidate y/ IIa/IIb
ID Pharmaco
Target ID/ Hit/Probe/ Clinical Toxicolog
logy Phase I
Phase
Discovery Lead ID Candidate y/ IIa/IIb
ID Pharmaco
logy
Target ID/ Hit/Probe/ Clinical Toxicolog Phase I
Phase
Discovery Lead ID Candidate y/ IIa/IIb
ID Pharmaco
Target ID/ Hit/Probe/ Clinical
logy
Toxicolog Phase I
Phase
Discovery Lead ID Candidate y/ IIa/IIb
ID Pharmaco
Target ID/ Hit/Probe/ Clinical Toxicolog
logy Phase I
Phase
Discovery Lead ID Candidate y/ IIa/IIb
ID Pharmaco
logy
Attrition 50% 10% 30% 30% 90%
Negative POC information is not shared
39. Let s imagine….
• A pool of dedicated, stable funding
• A process that attracts top scientists and clinicians
• A process in which regulators can fully collaborate to solve key
scientific problems
• An engaged citizenry that promotes science and acknowledges
risk
• Mechanisms to avoid bureaucratic and administrative barriers
• Sharing of knowledge to more rapidly achieve understanding of
human biology
• A steady stream of targets whose links to disease have been
validated in humans
40. Arch2POCM
A globally distributed public private partnership (PPP) committed to:
• Generate more clinically validated targets by sharing data
• Deliver more new drugs for patients by using compounds to understand disease biology
41. Arch2POCM: what s in a name?
Arch: as in archipelago and referring to the
distributed network of academic labs, pharma
partners and clinical sites that will contribute to
Arch2POCM programs
POCM: Proof Of Clinical Mechanism:
demonstration in a
Ph II setting that the
mechanism of the
selected disease
target can be safely
and usefully
modulated.
42.
43. Arch2POCM
Mission
To establish a pre-competitive stream of drug development
data and POCM candidates that:
1. Will focus on high risk/high opportunity targets
2. Will inform the industry regarding those targets that are validated for
clinical proof of concept mechanism (POCM) and those that are not
3. Will drive down redundant efforts in discovery and early development
4. Will lead to substantial cost avoidance (est. $12.5 B annuall
(HOW DOES THIS COMPLEMENT NIH TRANSLATIONAL CENTER)
PARTNERS/ WHO DOES WHAT/ NO IP /CROWDSOURCING
April
16-‐17,
2011
San
Francisco
44. Federation Projects: Building a Compute Space
Combining analysis + narrative
=Sweave Vignette
Sage Lab
R code + PDF(plots + text + code snippets)
narrative
HTML
Data objects
Califano Lab Ideker Lab Submitted
Paper
Shared Data JIRA: Source code repository & wiki
Repository
45. Reproducible science==shareable science
Sweave: combines programmatic analysis with narrative
Dynamic generation of statistical reports
using literate data analysis
Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports
using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –
Proceedings in Computational Statistics,pages 575-580.
Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
61. why consider the fourth paradigm- data intensive science
thinking beyond the narrative, beyond pathways
advantages of an open innovation compute space
it is more about how than what
62. OPPORTUNITIES FOR LUNG COMMUNITY
Data sets, Tools and Models for Lung Biology/Pathophsiology
Broad Institute cell line panels enriched in lung cancer
Change reward structures for sharing data
(patients and academics)
Several Pharma partners interested in building models
of respiratory disease- 2 public /3 Industry (Ron Crystal)