SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Bioinformatics seminar: Finding
cancer driver mutations
Ilya Minkin
Saint Petersburg Academical University
8th December 2012
1 / 26
Motivation
It is widely accepted that tumorogenesis heavily
depends on accumulation of specic mutation
If we consider genome of a tumor cell, we may
see thousands of alterations
Which mutations cause functional changes that
enhance tumor cell proliferation, i.e.
drivermutations?
And which mutations are unrelated to this
process, i.e. passengermutations?
This questions is one of the most challenging in
cancer genetics
2 / 26
Motivation
Genes that mutate in wide range of tumors can
be condently classied as drivergenes
However, there are many drivergenes that
mutate in  1% of tumors
We can't use traditional methods (studies in
model organisms, gene KO, etc) to analyze
hundreds of gene candidates
Thus we need a high-throughput computational
method for nding drivermutations that is
independent on frequency
The papers are focused on missense mutations,
not nonsense/frameshift
3 / 26
CAN-Predict
Distinguishing Cancer-Associated Missense Mutations
from Common Polymorphisms, Cancer Research
(2007)
Joshua S. Kaminker, Yan Zhang, Allison Waugh,
Peter M. Haverty, Brock Peters, Dragan Sebisanovic,
Jeremy Stinson, William F. Forrest, J. Fernando
Bazan, Somasekar Seshagiri, and Zemin Zhang
4 / 26
Overview
There are few well-characterized classes of SNPs
in the human genome:
Common missense SNP
Mendelian disease SNP  variants that cause
disease and follow Mendelian pattern of
inheritance
Complex disease SNP  variants that are related
to complex disease caused by many genetic and
environmental factors
Cancer driving mutations
We will try to understand how to distinguish
them and consequently build a classier
5 / 26
Three characteristics
For each known mutation we obtain:
1. SIFT predicts whether an amino acid
substitution aects protein function and gives a
score
2. Align both variant/canonical protein against
Pfam database using HMMER. Take the best
E-scores and calculate log10(Evariant/Ecanonical)
3. The log-odds scores representing the relative
frequency with which a Gene Ontology (GO)
term was used to annotate cancer or noncancer
gene sets
6 / 26
7 / 26
8 / 26
9 / 26
Conclusions
We see that cancer-driving mutations are similar
to Mendelian diseases SNPs
These three features can be used for
classication
Random forests are used for building the
classier
10 / 26
Random forests
Random forest is an ensemble of decision trees. They
are powerful and fast. Each tree is grown as follows:
If the number of cases in the training set is N,
sample N cases at random - but with
replacement, from the original data. This sample
will be the training set for growing the tree
If there are M input variables, a number m¾M is
specied such that at each node, m variables are
selected at random out of the M and the best
split on these m is used to split the node
Each tree is grown to the largest extent possible
11 / 26
Validation
The Out Of Bag (OOB) error rate is an
important measure of accuracy and is calculated
by applying each individual classication tree to
a subset of training data points not used in the
construction of the tree
OOB = 3.19%
Cross validation: of 730 variants, only 10 of 581
(1.7%) normal variants were misclassied as
cancer and only 13 of 149 (8.7%) cancer
variants were misclassied as normal
12 / 26
An application of such classier is distinguishing
relevant cancer-associated mutations from the
expected polymorphic variants often identied during
sequencing projects
13 / 26
Discussion
There is a classier that distinguishes between
cancer-driving mutations and passenger
Cross-validation shows that it is pretty accurate
It was also shown that somatic mutations are
more likely to be predicted as cancer-driving
compared to common SNPs
They also tried to predict some novel
cancer-driving mutations, but I'm so bored and
won't tell about it
14 / 26
CHASM
Cancer-Specic High-Throughput Annotation of
Somatic Mutations: Computational Prediction of
Driver Missense Mutations, Cancer Research(2009)
Hannah Carter, Sining Chen, Leyla Isik, Svitlana
Tyekucheva, Victor E. Velculescu, Kenneth W.
Kinzler, Bert Vogelstein, and Rachel Karchin
15 / 26
Previous work
There were classiers developed earlier
Driver mutations are similar to mutations
associated with Mendelian disease and may be
identiable by setting constraints on amino acid
residues at mutated positions
Passengers are more similar to nonsynonymous
single nucleotide polymorphisms (nsSNP) with
high minor allele frequencies (MAF)
Random forests(CAN-Predict), SVM(Protein
kinase-specic classier)
16 / 26
Previous work
Previous work used common SNPs as negative
training examples
Existing computational methods could detect
dierences between somatic missense mutations
observed in cancers and high MAF nsSNPs
But these dierences might be less relevant to
the discrimination between driver and passenger
mutations that occur somatically in tumors
17 / 26
SNPs and passenger mutations
Although high MAF nsSNPs and passenger
mutations have properties in common, they also
have dierences
Passenger mutations may or may not have a
functional impact on proteins; by denition, they
are neutral with respect to cancer cell tness
In contrast, high MAF nsSNPs have become
xed in the human genome and must be
functionally neutral or have a mild functional
impact with respect to normal cell tness
Let's generate synthetic set of passenger
mutations and train a classier
18 / 26
Overview
Random Forest classier that was trained on 49
predictive features
Feature selection was done with a protocol
based on mutual information
Driver mutation data set  2,488 missense
mutations previously identied as playing a
functional role in oncogenic transformation
The synthetic passenger mutations were
generated by sampling from eight multinomial
distributions that depend on dinucleotide
context and tumor type
The nal score yielded for each mutation is the
fraction of trees that voted for the passenger
class. 19 / 26
Candidate features
In total 80 features, such as:
Change in charge
BLOSUM 62 substitution score
17way exon conservation
SNP Density (The number of genetic variants or
polymorphisms in the exon where the mutation
is located)
...
20 / 26
21 / 26
Probabilistic interpretation of random
forest classication scores
They used the trained Random Forest to
compute a classication score for each of 607
glioblastoma multiforme (GBM) missense
mutations
However, these scores are not probabilities and
the statistical behavior of the algorithm has not
been well-characterized
It is not evident where to set a trusted score
cuto for purposes of identifying driver
mutations
22 / 26
Probabilistic interpretation of random
forest classication scores
For each of the 607 GBM mutants, we test the
null hypothesis: the mutant is not functionally
related to the growth of the tumor (passenger),
versus the alternative hypothesis that it is
(driver)
We obtain a P value for a mutation by
comparing its score to the null distribution,
which consists of the scores of a ltered set of
synthetic passengers that were held out from
Random Forest training
23 / 26
Other methods
PolyPhen classies missense mutants as
Probably Damaging, Possibly Damaging or
Benign and also provides a continuous
measure of a mutation's functional impact, the
PSIC score
SIFT provides a score that ranges between 0 and
1 to report the probability that a missense
mutation will be tolerated
SIFT/PolyPhen consensus
CanPredict
KinaseSVM
24 / 26
25 / 26
26 / 26

Contenu connexe

Tendances

Opportunities and challenges provided by crosstalk between signalling pathway...
Opportunities and challenges provided by crosstalk between signalling pathway...Opportunities and challenges provided by crosstalk between signalling pathway...
Opportunities and challenges provided by crosstalk between signalling pathway...Anirudh Prahallad
 
3478-43782-3-PB
3478-43782-3-PB3478-43782-3-PB
3478-43782-3-PB东妮 郑
 
Kurrey_et_al-2009-STEM_CELLS
Kurrey_et_al-2009-STEM_CELLSKurrey_et_al-2009-STEM_CELLS
Kurrey_et_al-2009-STEM_CELLSSwati Jalgaonkar
 
CCRC Clonality Poster
CCRC Clonality PosterCCRC Clonality Poster
CCRC Clonality PosterNathan How
 
A physical sciences network characterization of non-tumorigenic and metastati...
A physical sciences network characterization of non-tumorigenic and metastati...A physical sciences network characterization of non-tumorigenic and metastati...
A physical sciences network characterization of non-tumorigenic and metastati...Shashaanka Ashili
 
Cell lines breast cancer-project
Cell lines breast cancer-project Cell lines breast cancer-project
Cell lines breast cancer-project Amitha Dasari
 
Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018
Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018
Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018Amazon Web Services
 
Integrative Analysis of Gene Expression and Promoter Methylation during Repro...
Integrative Analysis of Gene Expression and Promoter Methylation during Repro...Integrative Analysis of Gene Expression and Promoter Methylation during Repro...
Integrative Analysis of Gene Expression and Promoter Methylation during Repro...Y-h Taguchi
 
Yang Poster 2015
Yang Poster 2015Yang Poster 2015
Yang Poster 2015Yang Cong
 
Proteogenomic analysis of human colon cancer reveals new therapeutic opportun...
Proteogenomic analysis of human colon cancer reveals new therapeutic opportun...Proteogenomic analysis of human colon cancer reveals new therapeutic opportun...
Proteogenomic analysis of human colon cancer reveals new therapeutic opportun...Gul Muneer
 
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...Allison Mitchell
 
AACR poster Yang
AACR poster YangAACR poster Yang
AACR poster YangYang Cong
 
BMC_Systems_Biology_2013_Udyavar
BMC_Systems_Biology_2013_UdyavarBMC_Systems_Biology_2013_Udyavar
BMC_Systems_Biology_2013_UdyavarAkshata Udyavar PhD
 

Tendances (20)

Opportunities and challenges provided by crosstalk between signalling pathway...
Opportunities and challenges provided by crosstalk between signalling pathway...Opportunities and challenges provided by crosstalk between signalling pathway...
Opportunities and challenges provided by crosstalk between signalling pathway...
 
GLIIFCA 22 final (1)
GLIIFCA 22 final (1)GLIIFCA 22 final (1)
GLIIFCA 22 final (1)
 
3478-43782-3-PB
3478-43782-3-PB3478-43782-3-PB
3478-43782-3-PB
 
Kurrey_et_al-2009-STEM_CELLS
Kurrey_et_al-2009-STEM_CELLSKurrey_et_al-2009-STEM_CELLS
Kurrey_et_al-2009-STEM_CELLS
 
CCRC Clonality Poster
CCRC Clonality PosterCCRC Clonality Poster
CCRC Clonality Poster
 
A physical sciences network characterization of non-tumorigenic and metastati...
A physical sciences network characterization of non-tumorigenic and metastati...A physical sciences network characterization of non-tumorigenic and metastati...
A physical sciences network characterization of non-tumorigenic and metastati...
 
Cell lines breast cancer-project
Cell lines breast cancer-project Cell lines breast cancer-project
Cell lines breast cancer-project
 
Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018
Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018
Intelligent Systems for Cancer Genomics (AIS305) - AWS re:Invent 2018
 
Pdx project
Pdx project  Pdx project
Pdx project
 
Integrative Analysis of Gene Expression and Promoter Methylation during Repro...
Integrative Analysis of Gene Expression and Promoter Methylation during Repro...Integrative Analysis of Gene Expression and Promoter Methylation during Repro...
Integrative Analysis of Gene Expression and Promoter Methylation during Repro...
 
I1803056267
I1803056267I1803056267
I1803056267
 
Yang Poster 2015
Yang Poster 2015Yang Poster 2015
Yang Poster 2015
 
Proteogenomic analysis of human colon cancer reveals new therapeutic opportun...
Proteogenomic analysis of human colon cancer reveals new therapeutic opportun...Proteogenomic analysis of human colon cancer reveals new therapeutic opportun...
Proteogenomic analysis of human colon cancer reveals new therapeutic opportun...
 
VSP Poster
VSP PosterVSP Poster
VSP Poster
 
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...
 
AACR poster Yang
AACR poster YangAACR poster Yang
AACR poster Yang
 
International Journal of Stem Cells & Research
International Journal of Stem Cells & ResearchInternational Journal of Stem Cells & Research
International Journal of Stem Cells & Research
 
Russell
RussellRussell
Russell
 
EHA_Accepted_Abstract
EHA_Accepted_AbstractEHA_Accepted_Abstract
EHA_Accepted_Abstract
 
BMC_Systems_Biology_2013_Udyavar
BMC_Systems_Biology_2013_UdyavarBMC_Systems_Biology_2013_Udyavar
BMC_Systems_Biology_2013_Udyavar
 

En vedette

Identification of cancer drivers across tumor types
Identification of cancer drivers across tumor typesIdentification of cancer drivers across tumor types
Identification of cancer drivers across tumor typesNuria Lopez-Bigas
 
Nsclc slide deck
Nsclc slide deckNsclc slide deck
Nsclc slide deckPLMMedical
 
In Silico Prescription of Anticancer Drugs Reveals Targeting Opportunities
In Silico Prescription of Anticancer Drugs Reveals Targeting OpportunitiesIn Silico Prescription of Anticancer Drugs Reveals Targeting Opportunities
In Silico Prescription of Anticancer Drugs Reveals Targeting OpportunitiesNuria Lopez-Bigas
 
10 Key ASCO 2014 Presentations in Lung Cancer
10 Key ASCO 2014 Presentations in Lung Cancer10 Key ASCO 2014 Presentations in Lung Cancer
10 Key ASCO 2014 Presentations in Lung CancerH. Jack West
 
New developments of targeted therapy in nsclc
New developments of targeted therapy in nsclcNew developments of targeted therapy in nsclc
New developments of targeted therapy in nsclclihua jiao
 
Ppt lung carcinoma part1
Ppt lung carcinoma part1Ppt lung carcinoma part1
Ppt lung carcinoma part1Juned Khan
 

En vedette (6)

Identification of cancer drivers across tumor types
Identification of cancer drivers across tumor typesIdentification of cancer drivers across tumor types
Identification of cancer drivers across tumor types
 
Nsclc slide deck
Nsclc slide deckNsclc slide deck
Nsclc slide deck
 
In Silico Prescription of Anticancer Drugs Reveals Targeting Opportunities
In Silico Prescription of Anticancer Drugs Reveals Targeting OpportunitiesIn Silico Prescription of Anticancer Drugs Reveals Targeting Opportunities
In Silico Prescription of Anticancer Drugs Reveals Targeting Opportunities
 
10 Key ASCO 2014 Presentations in Lung Cancer
10 Key ASCO 2014 Presentations in Lung Cancer10 Key ASCO 2014 Presentations in Lung Cancer
10 Key ASCO 2014 Presentations in Lung Cancer
 
New developments of targeted therapy in nsclc
New developments of targeted therapy in nsclcNew developments of targeted therapy in nsclc
New developments of targeted therapy in nsclc
 
Ppt lung carcinoma part1
Ppt lung carcinoma part1Ppt lung carcinoma part1
Ppt lung carcinoma part1
 

Similaire à Slides 3

BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposterElsa Fecke
 
NY Prostate Cancer Conference - C.L. Sawyers - Session 1: Gene copy number a...
NY Prostate Cancer Conference - C.L. Sawyers - Session 1:  Gene copy number a...NY Prostate Cancer Conference - C.L. Sawyers - Session 1:  Gene copy number a...
NY Prostate Cancer Conference - C.L. Sawyers - Session 1: Gene copy number a...European School of Oncology
 
Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...Gul Muneer
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Ronak Shah
 
NY Prostate Cancer Conference - S. Stone - Session 1: Cell cycle progression ...
NY Prostate Cancer Conference - S. Stone - Session 1: Cell cycle progression ...NY Prostate Cancer Conference - S. Stone - Session 1: Cell cycle progression ...
NY Prostate Cancer Conference - S. Stone - Session 1: Cell cycle progression ...European School of Oncology
 
Detecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachDetecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachHong ChangBum
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingJoaquin Dopazo
 
Alain Toledano : Test and genomic profile : what future in breast cancer trea...
Alain Toledano : Test and genomic profile : what future in breast cancer trea...Alain Toledano : Test and genomic profile : what future in breast cancer trea...
Alain Toledano : Test and genomic profile : what future in breast cancer trea...breastcancerupdatecongress
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...IJTET Journal
 
Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Am...
Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Am...Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Am...
Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Am...Samuel Sattath
 
Mining Gene Expression Data Focusing Cancer Therapeutics: A Digest
Mining Gene Expression Data Focusing Cancer Therapeutics: A DigestMining Gene Expression Data Focusing Cancer Therapeutics: A Digest
Mining Gene Expression Data Focusing Cancer Therapeutics: A DigestKaashivInfoTech Company
 
Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...Enrique Moreno Gonzalez
 
Liquid Biopsy Cancer Prevention and Monitoring
Liquid Biopsy Cancer Prevention and MonitoringLiquid Biopsy Cancer Prevention and Monitoring
Liquid Biopsy Cancer Prevention and MonitoringDavid Tjahjono,MD,MBA(UK)
 
2014-11-17 Taylor Broad Poster FINAL
2014-11-17 Taylor Broad Poster FINAL2014-11-17 Taylor Broad Poster FINAL
2014-11-17 Taylor Broad Poster FINALMichael Cuoco
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisPramod Sharma
 

Similaire à Slides 3 (20)

BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
 
NY Prostate Cancer Conference - C.L. Sawyers - Session 1: Gene copy number a...
NY Prostate Cancer Conference - C.L. Sawyers - Session 1:  Gene copy number a...NY Prostate Cancer Conference - C.L. Sawyers - Session 1:  Gene copy number a...
NY Prostate Cancer Conference - C.L. Sawyers - Session 1: Gene copy number a...
 
Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...
 
Cancer evolution
Cancer evolutionCancer evolution
Cancer evolution
 
NY Prostate Cancer Conference - S. Stone - Session 1: Cell cycle progression ...
NY Prostate Cancer Conference - S. Stone - Session 1: Cell cycle progression ...NY Prostate Cancer Conference - S. Stone - Session 1: Cell cycle progression ...
NY Prostate Cancer Conference - S. Stone - Session 1: Cell cycle progression ...
 
Detecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachDetecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble Approach
 
The Cancer Genome Atlas Update
The Cancer Genome Atlas UpdateThe Cancer Genome Atlas Update
The Cancer Genome Atlas Update
 
MCR_Article_JW
MCR_Article_JWMCR_Article_JW
MCR_Article_JW
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 
Alain Toledano : Test and genomic profile : what future in breast cancer trea...
Alain Toledano : Test and genomic profile : what future in breast cancer trea...Alain Toledano : Test and genomic profile : what future in breast cancer trea...
Alain Toledano : Test and genomic profile : what future in breast cancer trea...
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
 
Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Am...
Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Am...Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Am...
Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Am...
 
poster
posterposter
poster
 
Mining Gene Expression Data Focusing Cancer Therapeutics: A Digest
Mining Gene Expression Data Focusing Cancer Therapeutics: A DigestMining Gene Expression Data Focusing Cancer Therapeutics: A Digest
Mining Gene Expression Data Focusing Cancer Therapeutics: A Digest
 
Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...
 
Liquid Biopsy Cancer Prevention and Monitoring
Liquid Biopsy Cancer Prevention and MonitoringLiquid Biopsy Cancer Prevention and Monitoring
Liquid Biopsy Cancer Prevention and Monitoring
 
2014-11-17 Taylor Broad Poster FINAL
2014-11-17 Taylor Broad Poster FINAL2014-11-17 Taylor Broad Poster FINAL
2014-11-17 Taylor Broad Poster FINAL
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer Diagnosis
 
CRC CHP final PDF
CRC CHP final PDFCRC CHP final PDF
CRC CHP final PDF
 

Plus de BioinformaticsInstitute

Comparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphsComparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphsBioinformaticsInstitute
 
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
 Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес... Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...BioinformaticsInstitute
 
Вперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днкВперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днкBioinformaticsInstitute
 
"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр Предеус"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр ПредеусBioinformaticsInstitute
 
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...BioinformaticsInstitute
 
Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)BioinformaticsInstitute
 
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...BioinformaticsInstitute
 
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)BioinformaticsInstitute
 

Plus de BioinformaticsInstitute (20)

Graph genome
Graph genome Graph genome
Graph genome
 
Nanopores sequencing
Nanopores sequencingNanopores sequencing
Nanopores sequencing
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
Comparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphsComparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphs
 
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
 Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес... Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
 
Вперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днкВперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днк
 
Knime & bioinformatics
Knime & bioinformaticsKnime & bioinformatics
Knime & bioinformatics
 
"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр Предеус"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр Предеус
 
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
 
Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)
 
Плюрипотентность 101
Плюрипотентность 101Плюрипотентность 101
Плюрипотентность 101
 
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
 
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
 
Biodb 2011-everything
Biodb 2011-everythingBiodb 2011-everything
Biodb 2011-everything
 
Biodb 2011-05
Biodb 2011-05Biodb 2011-05
Biodb 2011-05
 
Biodb 2011-04
Biodb 2011-04Biodb 2011-04
Biodb 2011-04
 
Biodb 2011-03
Biodb 2011-03Biodb 2011-03
Biodb 2011-03
 
Biodb 2011-01
Biodb 2011-01Biodb 2011-01
Biodb 2011-01
 
Biodb 2011-02
Biodb 2011-02Biodb 2011-02
Biodb 2011-02
 
Ngs 3 1
Ngs 3 1Ngs 3 1
Ngs 3 1
 

Dernier

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 

Dernier (20)

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 

Slides 3

  • 1. Bioinformatics seminar: Finding cancer driver mutations Ilya Minkin Saint Petersburg Academical University 8th December 2012 1 / 26
  • 2. Motivation It is widely accepted that tumorogenesis heavily depends on accumulation of specic mutation If we consider genome of a tumor cell, we may see thousands of alterations Which mutations cause functional changes that enhance tumor cell proliferation, i.e. drivermutations? And which mutations are unrelated to this process, i.e. passengermutations? This questions is one of the most challenging in cancer genetics 2 / 26
  • 3. Motivation Genes that mutate in wide range of tumors can be condently classied as drivergenes However, there are many drivergenes that mutate in 1% of tumors We can't use traditional methods (studies in model organisms, gene KO, etc) to analyze hundreds of gene candidates Thus we need a high-throughput computational method for nding drivermutations that is independent on frequency The papers are focused on missense mutations, not nonsense/frameshift 3 / 26
  • 4. CAN-Predict Distinguishing Cancer-Associated Missense Mutations from Common Polymorphisms, Cancer Research (2007) Joshua S. Kaminker, Yan Zhang, Allison Waugh, Peter M. Haverty, Brock Peters, Dragan Sebisanovic, Jeremy Stinson, William F. Forrest, J. Fernando Bazan, Somasekar Seshagiri, and Zemin Zhang 4 / 26
  • 5. Overview There are few well-characterized classes of SNPs in the human genome: Common missense SNP Mendelian disease SNP variants that cause disease and follow Mendelian pattern of inheritance Complex disease SNP variants that are related to complex disease caused by many genetic and environmental factors Cancer driving mutations We will try to understand how to distinguish them and consequently build a classier 5 / 26
  • 6. Three characteristics For each known mutation we obtain: 1. SIFT predicts whether an amino acid substitution aects protein function and gives a score 2. Align both variant/canonical protein against Pfam database using HMMER. Take the best E-scores and calculate log10(Evariant/Ecanonical) 3. The log-odds scores representing the relative frequency with which a Gene Ontology (GO) term was used to annotate cancer or noncancer gene sets 6 / 26
  • 10. Conclusions We see that cancer-driving mutations are similar to Mendelian diseases SNPs These three features can be used for classication Random forests are used for building the classier 10 / 26
  • 11. Random forests Random forest is an ensemble of decision trees. They are powerful and fast. Each tree is grown as follows: If the number of cases in the training set is N, sample N cases at random - but with replacement, from the original data. This sample will be the training set for growing the tree If there are M input variables, a number m¾M is specied such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node Each tree is grown to the largest extent possible 11 / 26
  • 12. Validation The Out Of Bag (OOB) error rate is an important measure of accuracy and is calculated by applying each individual classication tree to a subset of training data points not used in the construction of the tree OOB = 3.19% Cross validation: of 730 variants, only 10 of 581 (1.7%) normal variants were misclassied as cancer and only 13 of 149 (8.7%) cancer variants were misclassied as normal 12 / 26
  • 13. An application of such classier is distinguishing relevant cancer-associated mutations from the expected polymorphic variants often identied during sequencing projects 13 / 26
  • 14. Discussion There is a classier that distinguishes between cancer-driving mutations and passenger Cross-validation shows that it is pretty accurate It was also shown that somatic mutations are more likely to be predicted as cancer-driving compared to common SNPs They also tried to predict some novel cancer-driving mutations, but I'm so bored and won't tell about it 14 / 26
  • 15. CHASM Cancer-Specic High-Throughput Annotation of Somatic Mutations: Computational Prediction of Driver Missense Mutations, Cancer Research(2009) Hannah Carter, Sining Chen, Leyla Isik, Svitlana Tyekucheva, Victor E. Velculescu, Kenneth W. Kinzler, Bert Vogelstein, and Rachel Karchin 15 / 26
  • 16. Previous work There were classiers developed earlier Driver mutations are similar to mutations associated with Mendelian disease and may be identiable by setting constraints on amino acid residues at mutated positions Passengers are more similar to nonsynonymous single nucleotide polymorphisms (nsSNP) with high minor allele frequencies (MAF) Random forests(CAN-Predict), SVM(Protein kinase-specic classier) 16 / 26
  • 17. Previous work Previous work used common SNPs as negative training examples Existing computational methods could detect dierences between somatic missense mutations observed in cancers and high MAF nsSNPs But these dierences might be less relevant to the discrimination between driver and passenger mutations that occur somatically in tumors 17 / 26
  • 18. SNPs and passenger mutations Although high MAF nsSNPs and passenger mutations have properties in common, they also have dierences Passenger mutations may or may not have a functional impact on proteins; by denition, they are neutral with respect to cancer cell tness In contrast, high MAF nsSNPs have become xed in the human genome and must be functionally neutral or have a mild functional impact with respect to normal cell tness Let's generate synthetic set of passenger mutations and train a classier 18 / 26
  • 19. Overview Random Forest classier that was trained on 49 predictive features Feature selection was done with a protocol based on mutual information Driver mutation data set 2,488 missense mutations previously identied as playing a functional role in oncogenic transformation The synthetic passenger mutations were generated by sampling from eight multinomial distributions that depend on dinucleotide context and tumor type The nal score yielded for each mutation is the fraction of trees that voted for the passenger class. 19 / 26
  • 20. Candidate features In total 80 features, such as: Change in charge BLOSUM 62 substitution score 17way exon conservation SNP Density (The number of genetic variants or polymorphisms in the exon where the mutation is located) ... 20 / 26
  • 22. Probabilistic interpretation of random forest classication scores They used the trained Random Forest to compute a classication score for each of 607 glioblastoma multiforme (GBM) missense mutations However, these scores are not probabilities and the statistical behavior of the algorithm has not been well-characterized It is not evident where to set a trusted score cuto for purposes of identifying driver mutations 22 / 26
  • 23. Probabilistic interpretation of random forest classication scores For each of the 607 GBM mutants, we test the null hypothesis: the mutant is not functionally related to the growth of the tumor (passenger), versus the alternative hypothesis that it is (driver) We obtain a P value for a mutation by comparing its score to the null distribution, which consists of the scores of a ltered set of synthetic passengers that were held out from Random Forest training 23 / 26
  • 24. Other methods PolyPhen classies missense mutants as Probably Damaging, Possibly Damaging or Benign and also provides a continuous measure of a mutation's functional impact, the PSIC score SIFT provides a score that ranges between 0 and 1 to report the probability that a missense mutation will be tolerated SIFT/PolyPhen consensus CanPredict KinaseSVM 24 / 26