Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Friend EORTC 2012-11-08
1. Integrating Cancer Networks and the Value
of Compute Spaces
Stephen H Friend
November 8, 2012
EORTC/NCI
Dublin
2. Oncogenes only make good targets in particular molecular
contexts : EGFR story
ERBB2
• EGFR Pathway commonly mutated/activated in Cancer
EGFRi EGFR • 30% of all epithelial cancers
BCR/ABL
• Blocking Abs approved for treatment of metastatic
colon cancer
KRAS NRAS
• Subsequently found that RASMUT tumors don’t respond
– “Negative Predictive Biomarker”
BRAF
• However still EGFR+ / RASWT patients who don’t
MEK1/2 respond? – need “Positive Predictive Biomarker”
• And in Lung Cancer not clear that RASMUT status is
Proliferation,
Survival useful biomarker
Predicting treatment response to known oncogenes is
complex and requires detailed understanding of how
different genetic backgrounds function
4. Preliminary Probabalistic Models- Rosetta
Networks facilitate direct
identification of genes that are
causal for disease
Evolutionarily tolerated weak spots
Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2
5. Extensive Publications now Substantiating Scientific Approach
Probabilistic Causal Bionetwork Models
>80 Publications from Rosetta Genetics/ Sage Bionetworks
Metabolic "Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
Disease "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009)
….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
CVD "Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
Bone "Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
d
“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
Methods "An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)
"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008)
…… Plus 3 additional papers in PLoS Genet., BMC Genet.
6.
7.
8. Iterative Networked Approaches
To Generating Analyzing and Supporting New Models
Data
Biological
System Analysis
Uncouple the automatic linkage between the
data generators, analyzers, and validators
9. An Alternative
Biomedicine
Information
Commons
Commons are resources that are owned in common or shared among
communities.
-David Bollier
10. Sage Bionetworks
A non-profit organization with a vision to enable networked team
approaches to building better models of disease
BIOMEDICINE INFORMATION COMMONS INCUBATOR
Technology Platform
Governance
Impactful Models
Better Models of
Disease:
INFORMATION
COMMONS
Challenges
14. Networked Approaches
BioMedicine Information Commons
Patients/
Citizens
Data
Generators
CURATED
DATA
Data
TOOLS/ Analysts
METHODS
RAW
DATA
ANALYZES/
MODELS
Clinicians
SYNAPSE
Experimentalists
15. FOUR PILOTS IN THE SAGE BIONETWORKS COMMONS INCUBATOR
• Provide a “compute space” for hosting and sharing models
– (to complement data storage and tools provided by Sanger Broad…)- SYNAPSE)
• Co-generate models of drivers for Cell Line/Clinical Sensitivity
• Host Challenges and other approaches that will maximize most
people providing and sharing their insights as quickly as possible
– https://synapse.sagebase.org/ - BCCOverview:0
• Engage citizens as partners in gathering information and insights and
funds
16. Two approaches to building common scientific
and technical knowledge
Every code change versioned
Every issue tracked
Text summary of the completed project Every project the starting point for new work
Assembled after the fact All evolving and accessible in real time
Social Coding
17. “Synapse is a compute platform
for transparent, reproducible, and
modular collaborative research.”
18. Synapse is GitHub for Biomedical Data
• Every code change versioned
• Every issue tracked
• Every project the starting point for new work
• Data and code versioned • Social/Interactive Coding
• Analysis history captured in real time
• Work anywhere, and share the results with anyone
• Social/Interactive Science
24. Download analysis and meta-analysis
Download another Cluster Download Evaluation and view more
Result stats
• Perform Model averaging
• Compare/contrast models
• Find consensus clusters
25. Synapse infrastructure for sharing, searching,
and analyzing TCGA data
Copy* Muta6on* Phenotype*
• Comparison of many modeling approaches applied
Expression* number*
to the same data.
• Models transparently shared and reusable through
Copy*
Expression* number* Muta6on* Phenotype* Synapse.
• Displayed is comparison of 6 modeling approaches
to predict sensitivity to 130 drugs.
• Extending pipeline to evaluate prediction of
Expression* Expression*
Phenotype* Phenotype* TCGA phenotypes.
Copy* Copy*
number* number* • Hosting of collaborative competitions to compare
Muta6on* Muta6on* models from many groups.
Accuracy$ 2)$
(R
Predic. on$
Predic6ve*
model*
genera6on*
Performance*
assessment*
130$
drugs$
26. Synapse transparent, reproducible, versioned machine
learning infrastructure for method comparison
Copy* 1) Automated, standardized workflows for
Expression* number* Muta6on* Phenotype* curation, QC and hosting of large-scale
datasets (Brig Mecham).
2) Programmatic APIs to load standaridzed
Copy* Muta6on* Phenotype* objects, e.g. R ExpressionSets (Matt Furia):
Expression* number*
Load cell line feature and response data:
> ccleFeatureData <- getEntity(ccleFeatureDataId)
> ccleResponseData <- getEntity(ccleResponseDataId)
Load TCGA feature and phenotype data (in same
format as cell line data):
Expression* Expression* 4) tcgaFeatureData <-<- getEntity(tcgaResponseDataId)
> Statistical performance assessment across models.
getEntity(tcgaFeatureDataId)
Phenotype* Phenotype* > tcgaResponseData
Copy* Copy*
number* number* custom model 1 custom model 2 custom model N
Muta6on* Muta6on*
3) Pluggable API to implement predictive
modeling algorithms.
User implements customTrain() and
5) Output of candidate biomarkers and feature
Predic6ve*
model* customPredict() functions.
evaluation (e.g. GSEA, pathway analysis)
genera6on*
custom model 1 all commonly2used machinemodel N
Support for custom model custom
learning methods (for automated
Performance* benchmarking against new methods)
assessment*
27. Objective assessment of factors influencing model
performance (>1 million predictions evaluated)
Sanger CCLE
Prediction accuracy
Cross validation prediction accuracy (R2)
improved by…
Not discretizing
data
Including
expression data
Elastic net
regression
130 compounds In Sock Jang 24 compounds
28. Assessment of pathway enrichment of inferred
predictive feature sets
KEGG REACTOME BIOCARTA
Sanger
Pathways
CCLE
Compounds
29. Data Analysis with Synapse
Run Any Tool
On Any Platform
Record in Synapse
Share with Anyone
30. Why Stratifying Patients for Therapy Matters
43% 59%
EGFR
60% mCRC
patients are
KRAS RASwt Chemotherapy
Chemotherapy +
Cetuximab
BRAF
40% 36%
MEK1/2
Metastatic
Proliferation, Colorectal Cancer (mCRC)
Survival
40% mCRC
patients are responder
RASmut
non-responder
But not all CRC patients that are RASwt respond to Cetuximab
In other cancers for which it is efficacious RAS status appears not to predict response (e.g. lung)
30
31. RAS Model using primary tumor data to predict KRAS mutation status
290 CRC samples:
• KRAS12 or KRAS13 (n=115) vs WT (n=175)
• Penalized regression model using ElasticNet and gene expression data
1.0
0.8
Robust External
True positive rate
0.6
Validation In CRC TCGA CRC
Khambata−Ford
0.4
Gaedcke
data sets
0.2
0.0 0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
Model specific to
CRC: does not
generalized to other
KRAS dependent
cancers
RAS signatures derived from CRC cohort can classify mutation status in CRC
31
33. Can we predict response to RAS Pathway Drugs in CRC Cell lines?
Correlate RASness Score with IC50 for drugs across 21 CRC cell lines from CCLE1 panel
P value
ERBB2
EGFR
Note: KRAS
BCR/ABL and/or BRAF
mutation status
KRAS NRAS NOT predictive
of response to
BRAF MEK inhibitor
PD-0325901
MEK1/2
AZD6244
Proliferation,
Survival
RASness Model Translates to predict response to RAS pathway drugs in CRC cell lines
1. Barretina et al. 2012 Nature. 483:603: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. 33
38. REDEFINING HOW WE WORK TOGETHER:
Sage/DREAM Breast Cancer Prognosis Challenge
39. What is the problem?
Our current models of disease biology are primitive and limit
doctor’s understanding and ability to treat patients
Current incentives reward those who
silo information and work in closed
systems
40. The Solution: Competitions to crowd-source research
in biology and other fields
Why competitions?
• Objective assessments
• Acceleration of progress
• Transparency
• Reproducibility
• Extensible, reusable models
Competitions in biomedical research
• CASP (protein structure)
• Fold it / EteRNA (protein / RNA structure)
• CAGI (genome annotation)
• Assemblethon / alignathon (genome assembly / alignment)
• SBV Improver (industrial methodology benchmarking)
• DREAM (co-organizer of Sage/DREAM competition)
Generic competition platforms
• Kaggle, Innocentive, MLComp
41. METABRIC
Anglo-Canadian collaboration
•Array-CGH
•Expression arrays
•Sequencing TP53 PIK3CA
•Amplified DNA and cDNA banks
•miRNA profiling
Gene sequencing (ICGC)
42. Sage/DREAM Challenge: Details and Timing
Phase 1: July thru end-Sep 2012 Phase 2: Oct 15 thru Nov 12,
2012
Training data: 2,000 breast cancer
samples from METABRIC cohort Evaluation of models in novel
• Gene expression dataset.
• Copy number
• Clinical covariates Validation data: ~500 fresh
• 10 year survival frozen tumors from Norway
Supporting data: Other Sage-curated group with:
breast cancer datasets
• Clinical covariates
• >1,000 samples from GEO
• ~800 samples from TCGA • 10 year survival
• ~500 additional samples from
Norway group
• Curated and available on
Synapse, Sage’s compute
platform
Data released in phases on Synapse
from now through end-September
Will evaluate accuracy of models built
on METABRIC data to predict survival
in:
• Held out samples from
METABRIC
• Other datasets
43. Synapse transparent, reproducible, versioned machine
learning infrastructure for method comparison
Copy* Muta6on* Phenotype*
Expression* number*
Copy* Muta6on* Phenotype*
Expression* number*
Expression* Expression*
Phenotype* Phenotype*
Copy* Copy* Custom models implement train() and
number* number* predict() API.
Muta6on* Muta6on*
Predic6ve*
model*
genera6on*
Performance*
assessment*
Implementation of simple clinical-only survival
model used as baseline predictor.
45. Sage-DREAM Breast Cancer Prognosis Challenge
one month of building better disease models together
breast cancer data
154 participants; 27 countries
268 participants; 32 countries
August 17 Status
Challenge Launch: July 17
290 models posted to Leaderboard
46. Summary of Breast Cancer Challenge #1
https://synapse.sagebase.org/ - BCCOverview:0
Transparency, Validation in novel
reproducibility Copy*
Expression* number* Muta6on* Phenotype*
dataset
Copy* Muta6on* Phenotype*
Expression* number*
Expression* Expression*
Phenotype* Phenotype*
Copy* Copy*
number* number*
Muta6on* Muta6on*
Predic6ve*
model*
genera6on*
Performance*
assessment*
Publication in Science Donation of Google-
Translational Medicine scale compute space.
For the goal of promoting democratization of medicine…
Registration starting NOW…
sign up at: synapse.sagebase.org
47. FOUR PILOTS IN THE SAGE BIONETWORKS COMMONS INCUBATOR
• Provide a “compute space” for hosting and sharing models
– (to complement data storage and tools provided by Sanger Broad…)- SYNAPSE)
• Co-generate models of drivers for Cell Line/Clinical Sensitivity
• Host Challenges and other approaches that will maximize most
people providing and sharing their insights as quickly as possible
– https://synapse.sagebase.org/ - BCCOverview:0
• Engage citizens as partners in gathering information and insights and
funds
48. Networked Approaches
BioMedicine Information Commons
Patients/
Citizens
Data
Generators
CURATED
DATA
Data
TOOLS/ Analysts
METHODS
RAW
DATA
ANALYZES/
MODELS
Clinicians
SYNAPSE
Experimentalists
49. Upon this gifted age, in its dark hour,
Rains from the sky a meteoric shower
Of Facts…they lie unquestioned,uncombined.
Wisdom enough to leech us of our ill
Is daily spun; but there exists no loom
To weave it into fabric.
- Edna St. Vincent Millay