12. How to identify cancer drivers?
Find signs of positive selection across
tumour re-sequenced genomes
13. Frequency based approaches to identify drivers
Assume that cancer drivers are mutated more frequently than
background in a cohort of tumours
samples
Recurrence analysis
genes
genes
not mutatedmutated driver gene
MutSig (Broad Institute)
MuSiC-SMG (Washington University)
14. Frequency based approaches to identify drivers
Assume that cancer drivers are mutated more frequently than
background in a cohort of tumours
samples
Recurrence analysis
genes
genes
not mutatedmutated driver gene
MutSig (Broad Institute)
MuSiC-SMG (Washington University)
• Difficulty to correctly estimate the background mutation rates
• Cannot identify lowly recurrent mutated driver genes
• Need raw data (eg. BAM files) to assess sequencing coverage per region
• Computationally costly
Main Challenges of frequency based approaches
16. How to identify drivers across projects in a scalable way?
• Do not need large nor protected data (eg. list of tumour somatic mutations)
• Are not computationally expensive
• Are robust to differences in mutation calling
Ideally computational methods that:
17. How to identify drivers across projects in a scalable way?
• Do not need large nor protected data (eg. list of tumour somatic mutations)
• Are not computationally expensive
• Are robust to differences in mutation calling
Ideally computational methods that:
OncodriveFM OncodriveCLUST
We have developed 2 methods with these properties:
18. Finding drivers using functional impact bias (FM bias)
Gonzalez-Perez and Lopez-Bigas. NAR 2012
Abel Gonzalez-Perez
Gene A Gene B
Functional Impact metrics:
•SIFT
•Mutation Assessor
•Polyphen2
FI score
highlow
OncodriveFM
19. 1. Compute FI scores for nsSNVs (combining MutationAssessor, SIFT, Polyphen2)
2. Compute FI scores of other variants (STOP, synonymous and frameshift) using a set of rules
SIFT Polyphen2 MutationAssessor
Synonymous 1 0 -2
STOP-gain 0 1 3.5
Frameshift 0 1 3.5
STEP 1: Assess the functional impact (FI) of all variants
FI score
not mutated
FI score
highlow
OncodriveFM method’s details
20. OncodriveFM method’s details
STEP 2: Compute FM bias per gene
samples
genes
genes
Functional Impact
HighLow
OncodriveFM
not mutated driver gene
21. OncodriveFM method’s details
Compute FM bias per module
not mutated
FI score
highlow 0.0010
FM qvalue
samplesmodule1module2
module 1
module 2
OncodriveFM
22. • It does not depend on background mutation rates
• Only needs list of somatic mutations
• It is computationally cheap
• Can identify lowly recurrent mutated driver genes
Main Advantages of FM bias approach
OncodriveFM main advantages
26. PIK3CA is recurrently mutated in the
same residue in breast tumours
Lowly scored by
functional impact metrics
H1047L
PIK3CA
Protein position
0 1047
Proteinaffectingmutations
80
0
27. Finding drivers using regional clustering of mutations
Tamborero et al., Under review
Proteinaffectingmutations
Protein position
KRAS
0
500
0 175
OncodriveCLUST
12
David Tamborero
28. OncodriveCLUST method’s details
Th
Gene A Gene B
(I)
(II)
(III)
(IV)
(V)
Th
SgeneA
= Sc1
SgeneB
= Sc1
+ SC2
(VI)
0
ZA
ZB
mutations
Amino acid
C1
C1 C2
Amino acid
mutationsmutationsmutations
S
geneA
SgeneB
Tamborero et al., Under review
background model obtained by
calculating the clustering score per
gene of the coding-silent mutations
29. • It does not depend on background mutation rates
• It is computationally cheap
• Only needs list of somatic mutations
• It is complementary to OncodriveFM
Main Advantages of FM bias approach
OncodriveCLUST main advantages
31. Combining methods with
complementary principles helps to
obtain a more comprehensive and
reliable list of cancer drivers
✓ Functional Impact Bias
✓ Mutation Clustering
✓ Mutation Frequency
32. The Mechanisms of tumorigenesis
Data
Computational
methods
Analysis
Results
.org
33. Catalogs of
tumor somatic
mutations
✓ Identify consequences of mutations (Ensembl VEP)
✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)
✓ Compute frequency of mutations per gene and pathway
✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)
Input data Analysis Pipeline (powered by Wok) Browser
IntOGen SM-Analysis pipeline
To interpret catalogs of cancer somatic mutations
Christian Perez-Llamas
Workflow Management Sytem
34.
35. Catalogs of
tumor somatic
mutations
✓ Identify consequences of mutations (Ensembl VEP)
✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)
✓ Compute frequency of mutations per gene and pathway
✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)
Input data Analysis Pipeline (powered by Wok) Browser
IntOGen SM-Analysis pipeline
To interpret catalogs of cancer somatic mutations
Christian Perez-Llamas
Workflow Management Sytem
36. Catalogs of
tumor somatic
mutations
✓ Identify consequences of mutations (Ensembl VEP)
✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)
✓ Compute frequency of mutations per gene and pathway
✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)
Input data Analysis Pipeline (powered by Wok) Browser
IntOGen SM-Analysis pipeline
To interpret catalogs of cancer somatic mutations
Currently:
27 Projects
12 Cancer sites
3229 tumours
.org
http://beta.intogen.org
Christian Perez-Llamas
Workflow Management Sytem
37. 27 cancer sequencing datasets analysed so far
Total = 3329
CANCER SITE AUTHORS SOURCE
Number of
Samples
brain TCGA TCGA DATA PORTAL 248
brain DKFZ ICGC DCC 114
brain Johns Hopkins University ICGC DCC 88
breast TCGA TCGA DATA PORTAL 510
breast Broad Institute PubMed 102
breast WTSI ICGC DCC 100
breast Washington University School of Medicine PubMed 75
breast University of British Columbia PubMed 65
breast Johns Hopkins University ICGC DCC 41
colon TCGA TCGA DATA PORTAL 105
colon Johns Hopkins University ICGC DCC 34
corpus uteri TCGA TCGA DATA PORTAL 247
hematopoietic CLL-ICGC ICGC DCC 109
hematopoietic Dana-Farber Cancer Institute PubMed 90
Kidney TCGA TCGA DATA PORTAL 298
liver and bile ducts IACR ICGC DCC 24
lung and bronchus TCGA TCGA DATA PORTAL 177
lung and bronchus Washington University School of Medicine ICGC DCC 156
lung and bronchus Johns Hopkins University PubMed 43
lung and bronchus Medical College of Wisconsin PubMed 31
lung and bronchus University of Cologne PubMed 26
oropharynx Broad Institute PubMed 74
ovary TCGA TCGA DATA PORTAL 337
pancreas Johns Hopkins University ICGC DCC 113
pancreas Queensland Centre for Medical Genomics ICGC DCC 67
pancreas Ontario Institute for Cancer Research ICGC DCC 33
stomach Pfizer Worldwide Research and Development PubMed 22
57. Tumor Somatic Mutations in one tumor
Users’s Data User’s private browser
SM
pipeline
Tumor Somatic Mutations per sample
Users’s Data User’s private browser
SM
pipeline
Use case 1: Cohort analysis
Use case 2: Single sample analysis
View matrix of mutated genes per sample
See predicted impact of mutations
Find cancer driver genes
Find FMbiased pathways
Explore the results in the context of accummulated knownledge in IntOGen
See predicted impact of mutations
Find recurrent mutations found in IntOGen
Find mutations in candidate driver genes found in IntOGen
58.
59. The Mechanisms of tumorigenesis
Data
Computational
methods
Analysis
Results
.org
60. The Mechanisms of tumorigenesis
Data
Computational
methods
Analysis
Results
.org
PanCancer project
61. The Mechanisms of tumorigenesis
Data
Computational
methods
Analysis
Results
PanCancer project
62. Visualization and analysis of genomic
data using Interactive Heatmaps
http://www.gitools.org Perez-Llamas and Lopez-Bigas. PLoS ONE 2011
Christian Perez-Llamas
63. Muldimesional heatmaps
Michael P. Schroeder
Sort by mutually exclusive alterations
Schroeder MP, Gonzalez-Perez A and Lopez-Bigas N. Visualizing multidimensional cancer genomics data.
Genome Medicine. 2013, 5:9
64. Summary
• OncodriveFM and OncodriveCLUST are complementary methods
to identify cancer drivers
• Oncodrive methods are scalable and robust
• IntOGen contains results of analysing more than 3000 tumours to
identify cancer drivers across sites
• IntOGenSM pipeline is available to run your own projects
• TCGA PanCancer analysis on the way
• Gitools - interactive heatmaps - useful to explore multidimesional
cancer genomics data
65. Biomedical Genomics Lab
@bbglab
@nlbigas
Gunes Gundem
Christian Perez-Llamas
Jordi Deu-Pons
Michael Schroeder
Alba Jené-Sanz
Nuria Lopez-Bigas David Tamborero Abel Gonzalez-Perez
Alberto Santos
http://bg.upf.edu/blog