SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Robust Prediction of Cancer Disease Using Pattern
Classification of Microarray Gene-Expression Data
Presented by-
Md. Mushfiqur Rahman
Researcher
Bioinformatics Lab.
Dept. of Statistics, R.U.
E-mail: mushfiq_194@yahoo.com
Md. Matiur Rahaman1,2, Md. Mushfiqur Rahman2, Md. Nurul Haque Mollah2 and Ming Chen1
1. Department of Bioinformatics, College of Life Sciences, Zhejiang University, Zijingang Campus, Hangzhou 310058, China.
2. Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh.
International Conference on Applied Statistics (ICAS)
The Institute of Statistical Research and Training (ISRT)
University of Dhaka, Dhaka 27-29 December 2014
Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU.
1
Welcome to presentation
on
Outlines
1. Introduction to Gene-Expression Data.
2. Robust Classifier.
3. Performance Investigation of Robust Classifiers using
Simulated Data.
4. Performance Investigation using Simulated Gene-
Expression Profile for Prediction of Cancer Disease.
5. Performance Investigation using Real Gene-Expression
Profile for Prediction of Cancer Disease.
6. Conclusion.
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
2
Introduction to Gene-Expression Data
• Expression level of genes in an individual that is measured through
Microarray is called Gene-Expression data. Each data point produced by a
DNA microarray hybridization experiment represents the ratio of expression
levels of a particular gene under two different experimental conditions.
Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU.
3Gene Expression
Microarray Technology and Gene Expression Data
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
4
Example of Gene-Expression Data
Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU.
5
Genes
mRNA samples
sample1 sample2 sample3 sample4 sample5 …
1 0.46 0.30 0.80 1.51 0.90 ...
2 -0.10 0.49 0.24 0.06 0.46 ...
3 0.15 0.74 0.04 0.10 0.20 ...
4 -0.45 -1.03 -0.79 -0.56 -0.32 ...
5 -0.06 1.06 1.35 1.09 -1.09 ...
Gene expression level of gene i in mRNA sample j
= Log( Red intensity / Green intensity)
A Complete workflow for Gene-Expression data analysis
Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU.
6
• Workflow for real microarray gene expression data classification-
Hierarchical Clustering
Partition-Based
Clustering
Divisive Methods
(Top - Down)
Agglomerative methods
(Bottom - Up)
1. Single Linkage Clustering / Nearest Neighbor Technique
2. Complete Linkage Clustering
3. Average Linkage Clustering
4. Ward's Hierarchical Clustering
5. Centroid Method
6. Median Method
7. And so on
Different Classification
Unsupervised classification
(Clustering)
Supervised classification
1.Bayes classifier.
2.Maximum likelihood classifier.
3. FLDA (Fisher Linear
Discriminate Analysis)
4. SVM (Support Vector Machines)
5. Decision Trees
6. K-NN (K-Nearest Neighbors)
7. AdaBoost .
8. Robust Classifier (Proposed)
9.And so on.
1. K-Means Clustering
2. Fuzzy Clustering
3. Model Based Clustering
4. And so on
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
7
Bayes Classifier
Bayes classifier: Classify objects to a class with probability.
Foundation: Based on Bayes Theorem.
A short note on Bayes classifier under normal populations
 Let π1 ,…, πm be m normal populations .
Let {xi
(k) ~ , i=1,2, …, Nk ; k=1,2, …, m} be the training data set.
Objective is to classify a new data vector (or test data vector) x into one of
k populations π1, … , πm .
Let the prior probability of be qk which is known.
Then the posterior probability of is defined by,
Where, fk (x) = be the pdf of πk .
),( )(
VN k
p 
kx 
kx 
),( )(
VN k
p 
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
8
Bayes Classifier (Cont…)
Then the classification region Rk is defined for classifying x to the population
Πk as follows:
Discriminant
function
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
9This is known as Bayes classifier to classify an object x to the population Πk
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
10
Bayes Classifier (Cont…)
• Traditional Bayes procedure may produce misleading results in presence of outliers in
the training dataset or test dataset or in both datasets.
• To improve the results, one can replace MLEs by the robust estimators like MVE
(Rousseeuw et al.,1985) , MCD (Rousseeuw et al.,1985) and OGK (Maronna and
Zama 2002) estimators.
• But the performance of this robust procedures are not so good in the case of high
dimensional dataset.
Also these estimators may not control the influence of contaminated test vector (x).
• To overcome this problem, an attempt is made to Robustify the Bayes procedures by
minimum β−divergence method (Mollah et al., 2007, 2010).Which is our proposed
method.
Robust Bayes classifier
Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU.
11
• The minimum β-divergence estimator 𝜇 𝛽
(𝑘)
and 𝑉𝛽
(𝑘)
for the mean vector μ(k)
and the covariance matrix V(k) respectively are obtained iteratively as
follows:
𝜇 𝑟+1
(𝑘)
=
𝜙 𝛽 𝒙𝑖
(𝑘)
;𝜇 𝑟
(𝑘)
,𝑉𝑟
(𝑘)
𝒙𝑖
(𝑘)𝑛 𝑘
𝑖=1
𝜙 𝛽 𝒙𝑖
(𝑘)
;𝜇 𝑟
(𝑘)
,𝑉𝑟
(𝑘)𝑛 𝑘
𝑖=1
and, 𝑉𝑟+1
(𝑘)
=
𝜙 𝛽 𝒙𝑖
(𝑘)
;𝜇 𝑟
(𝑘)
,𝑉𝑟
(𝑘)
𝜓(𝒙 𝑖
(𝑘)
;𝜇 𝑟
(𝑘)
)
𝑛 𝑘
𝑖=1
𝛽+1 −1 𝜙 𝛽 𝒙𝑖
(𝑘)
;𝜇 𝑟
(𝑘)
,𝑉𝑟
(𝑘)𝑛 𝑘
𝑖=1
where,
• 𝜙 𝛽 𝒙𝑖
(𝑘)
; 𝜇 𝑟
(𝑘)
, 𝑉𝑟
(𝑘)
= 𝑒𝑥𝑝 −
𝛽
2
(𝒙𝑖
𝑘
−𝜇 𝑟
(𝑘)
) 𝑇
𝑉𝑟
(𝑘)−1
(𝒙𝑖
𝑘
−𝜇 𝑟
(𝑘)
) is β-
weight function & 𝜓(𝒙𝑖
(𝑘)
; 𝜇 𝑟
(𝑘)
) = (𝒙𝑖
𝑘
−𝜇 𝑟
(𝑘)
) (𝒙𝑖
𝑘
−𝜇 𝑟
(𝑘)
) 𝑇
• β=0, these estimators reduces to classical non-iterative estimates.
Robust Bayes Classifier (Cont…)
Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU.
12
• Step-1: First, we calculate β-weight for the test vector (x) using the β-
weight function-
and then we construct a criteria to test the data vector is contaminated or not as
follows:
• The 𝛽- weight function plays the significant role for robustification of Bayes
classifier as discussed follow-
Robust Bayes Classifier (Cont…)
Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU.
13
Robust Bayes Classifier (Cont…)
14
Step 2: : If the unclassified data vector x is contaminated by outliers,
we calculate the absolute difference between the contaminated vector
and each mean vector as-
𝐝 𝑘𝑖 = abs 𝒙𝑖 − 𝜇𝑖,𝛽
𝑘
; 𝑖 = 1,2, … , 𝑝,
Compute sum of the smallest r components of dk as
Sk = dk(1) + dk(2) + . . . + dk(r)
where r=round(p/2). Then find the tentative class or population for the
unclassified data vector x as-
k =
𝑎𝑟𝑔𝑚𝑖𝑛𝑆 𝑘
𝑘
Then some or all components of the unclassified contaminated data
vector x corresponding to dk(r+1), dk(r+2), ... ,dk(p) are assumed to be
corrupted by outliers.
Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU.
Performance Investigation of Robust Classifiers using Simulated Data
Both contamination
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
15
No contamination
Application of the Proposed Method for Gene Expression
Data Analysis
Gene Expression Data Generating Model
NowakandTibshirani(2008)
Biostatistics.9,3,467-483
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
16
Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease
Two Class Gene Classification (Absence of Outliers)
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
17
Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease
Two Class Gene Classification (Presence of Outliers)
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
18
Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease
Three Class Gene Classification (Absence of Outliers)
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
19
Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease
Three Class Gene Classification (Absence of Outliers)
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
20
Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease
(No Contamination)
Box Plot For Cancer Individuals Classification
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
21
Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease
(Train Data Contamination)
Box Plot For Cancer Individuals Classification
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
22
Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease
(Test Data Contamination)
Box Plot For Cancer Individuals Classification
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
23
Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease
(Both Data Contamination)
Box Plot For Cancer Individuals Classification
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
24
Real Gene Expression Data Analysis
Head and Neck Cancer Data
(Kuriakose et al., 2004)
12,625 genes , 22 Normal Patient, 22 Cancer Patient
594 DE Genes of
12,625 Genes,
Calculated
by
EBarrays Method
Training gene-set½ of DE
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
25
Performance Investigation using Real Gene-Expression Profile for Prediction of Cancer Disease
(In absence of outlier)
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
26
Performance Investigation using Real Gene-Expression Profile for Prediction of Cancer Disease
(In Presence of outlier)
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU.
27
Conclusion
Bayes procedure is a popular tool for classification. However, the
traditional Bayes procedure is very much sensitive to outliers. So
we discuss a robustification of Bayes procedure by β-divergence
(Mollah et al., 2007, 2010).
We compare our proposed method with some popular
classification methods (SVM, KNN, AdaBoost, those are use for
Microarray gene expression data analysis) using simulated datasets
and we observe that the performance of our proposed method is
better than all comparable methods as early mentioned.
We have checked the performance of proposed method in
simulated and real both gene-expression data analysis. From the
above discussion simulation and real data results shows that the
proposed method significantly improves the performance over the
traditional Bayes methods in presence of outliers; otherwise, it
keeps equal performance.
Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU
28
Anderson,T.W.(2003): An Introduction to Multivariate Statistical Analysis,Wiley Interscience
Johnson, R.A., Wichern, D.W. (2007): Applied multivariate statistical analysis, Sixth edition, Prentice-Hall.
Mollah,M.N.H., Minami,M. and Eguchi, S. (2007): Robust prewhitening for ICA by minimizing beta-
divergence and its application to FastICA. Neural processing Letters,25(2), pp. 91-110.
Mollah, M.N.H.,Sultana,N., Minami, M. and Eguchi, S. (2010): Robust extraction of local structures by the
minimum β-divergence method. Neural Networks, 23, pp. 226-238.
Wang,S.,Gui,j. and Li,X. (2008): Factor analysis for cross-platform tumer classification based on gene
expression profiles. Journal of Circuits,Systems,and Computers, 19, pp. 243-258.
Wuju L. and Momiao X.(2002): Tumor classification system based on gene expression profile.
Bioinformatics, 18(2): pp. 325-326.
Wright G.,Tan B., Rosenwald A., Hurt E., Wiestner A. and Staudt L. (2003): A gene expression-based
method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc Natl Acad Sci USA,
2003, 100:9991-9996.
Nowak, G. and Tibshirani, R. (2008) Complementary Hierarchical Clustering. Biostatistics. 9, 3, 467-483.
Veer, L.J. et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530-
536.
References
Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU.
29
Thank you for
Listening.
Supported by HEQEP (CP-3603.R3.W2)
and
Bioinformatics Lab., Dept. of Statistics,
University of Rajshahi.

Contenu connexe

Tendances

Outlier Modification and Gene Selection for Binary Cancer Classification usin...
Outlier Modification and Gene Selection for Binary Cancer Classification usin...Outlier Modification and Gene Selection for Binary Cancer Classification usin...
Outlier Modification and Gene Selection for Binary Cancer Classification usin...CSCJournals
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldDmitry Grapov
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchJessica Minnier
 
Clustering and Classification of Cancer Data Using Soft Computing Technique
Clustering and Classification of Cancer Data Using Soft Computing Technique Clustering and Classification of Cancer Data Using Soft Computing Technique
Clustering and Classification of Cancer Data Using Soft Computing Technique IOSR Journals
 
Hybrid prediction model with missing value imputation for medical data 2015-g...
Hybrid prediction model with missing value imputation for medical data 2015-g...Hybrid prediction model with missing value imputation for medical data 2015-g...
Hybrid prediction model with missing value imputation for medical data 2015-g...Jitender Grover
 
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
 
Gene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based MethodGene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based MethodIOSR Journals
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab
 
Diagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set TheoryDiagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set TheoryIRJET Journal
 
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...ijsc
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Natalio Krasnogor
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsDmitry Grapov
 
Mth 231 Education Specialist-snaptutorial.com
Mth 231 Education Specialist-snaptutorial.comMth 231 Education Specialist-snaptutorial.com
Mth 231 Education Specialist-snaptutorial.comrobertlesew78
 
Delineation of techniques to implement on the enhanced proposed model using d...
Delineation of techniques to implement on the enhanced proposed model using d...Delineation of techniques to implement on the enhanced proposed model using d...
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
 
Connecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextConnecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextDmitry Grapov
 

Tendances (20)

1207.2600
1207.26001207.2600
1207.2600
 
Outlier Modification and Gene Selection for Binary Cancer Classification usin...
Outlier Modification and Gene Selection for Binary Cancer Classification usin...Outlier Modification and Gene Selection for Binary Cancer Classification usin...
Outlier Modification and Gene Selection for Binary Cancer Classification usin...
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical Research
 
Poster_JOBIM_v4.2
Poster_JOBIM_v4.2Poster_JOBIM_v4.2
Poster_JOBIM_v4.2
 
Clustering and Classification of Cancer Data Using Soft Computing Technique
Clustering and Classification of Cancer Data Using Soft Computing Technique Clustering and Classification of Cancer Data Using Soft Computing Technique
Clustering and Classification of Cancer Data Using Soft Computing Technique
 
Hybrid prediction model with missing value imputation for medical data 2015-g...
Hybrid prediction model with missing value imputation for medical data 2015-g...Hybrid prediction model with missing value imputation for medical data 2015-g...
Hybrid prediction model with missing value imputation for medical data 2015-g...
 
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
 
Gene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based MethodGene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based Method
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
Diagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set TheoryDiagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set Theory
 
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
Mth 231 Education Specialist-snaptutorial.com
Mth 231 Education Specialist-snaptutorial.comMth 231 Education Specialist-snaptutorial.com
Mth 231 Education Specialist-snaptutorial.com
 
Delineation of techniques to implement on the enhanced proposed model using d...
Delineation of techniques to implement on the enhanced proposed model using d...Delineation of techniques to implement on the enhanced proposed model using d...
Delineation of techniques to implement on the enhanced proposed model using d...
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
Connecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextConnecting Metabolomic Data with Context
Connecting Metabolomic Data with Context
 

Similaire à Robust Prediction of Cancer Disease Using Pattern Classification of Microarray Gene-Expression

IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine LearningIRJET Journal
 
coad_machine_learning
coad_machine_learningcoad_machine_learning
coad_machine_learningFord Sleeman
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Alexander Decker
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposterElsa Fecke
 
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForestWisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForestSheing Jing Ng
 
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...Damian R. Mingle, MBA
 
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...IRJET Journal
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 
Mammography-a techniques to identifiy breast cancer.ppt
Mammography-a techniques to identifiy breast cancer.pptMammography-a techniques to identifiy breast cancer.ppt
Mammography-a techniques to identifiy breast cancer.pptomkarnunna1
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsNatalio Krasnogor
 
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET Journal
 
Multi-trait modeling in polygenic scores
Multi-trait modeling in polygenic scoresMulti-trait modeling in polygenic scores
Multi-trait modeling in polygenic scoresYosuke Tanigawa
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...IJTET Journal
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...mlaij
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
 

Similaire à Robust Prediction of Cancer Disease Using Pattern Classification of Microarray Gene-Expression (20)

IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
coad_machine_learning
coad_machine_learningcoad_machine_learning
coad_machine_learning
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
 
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForestWisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
 
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
 
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
Mammography-a techniques to identifiy breast cancer.ppt
Mammography-a techniques to identifiy breast cancer.pptMammography-a techniques to identifiy breast cancer.ppt
Mammography-a techniques to identifiy breast cancer.ppt
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
 
Multi-trait modeling in polygenic scores
Multi-trait modeling in polygenic scoresMulti-trait modeling in polygenic scores
Multi-trait modeling in polygenic scores
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
 
Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 

Dernier

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 

Dernier (20)

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 

Robust Prediction of Cancer Disease Using Pattern Classification of Microarray Gene-Expression

  • 1. Robust Prediction of Cancer Disease Using Pattern Classification of Microarray Gene-Expression Data Presented by- Md. Mushfiqur Rahman Researcher Bioinformatics Lab. Dept. of Statistics, R.U. E-mail: mushfiq_194@yahoo.com Md. Matiur Rahaman1,2, Md. Mushfiqur Rahman2, Md. Nurul Haque Mollah2 and Ming Chen1 1. Department of Bioinformatics, College of Life Sciences, Zhejiang University, Zijingang Campus, Hangzhou 310058, China. 2. Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh. International Conference on Applied Statistics (ICAS) The Institute of Statistical Research and Training (ISRT) University of Dhaka, Dhaka 27-29 December 2014 Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU. 1 Welcome to presentation on
  • 2. Outlines 1. Introduction to Gene-Expression Data. 2. Robust Classifier. 3. Performance Investigation of Robust Classifiers using Simulated Data. 4. Performance Investigation using Simulated Gene- Expression Profile for Prediction of Cancer Disease. 5. Performance Investigation using Real Gene-Expression Profile for Prediction of Cancer Disease. 6. Conclusion. Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 2
  • 3. Introduction to Gene-Expression Data • Expression level of genes in an individual that is measured through Microarray is called Gene-Expression data. Each data point produced by a DNA microarray hybridization experiment represents the ratio of expression levels of a particular gene under two different experimental conditions. Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU. 3Gene Expression
  • 4. Microarray Technology and Gene Expression Data Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 4
  • 5. Example of Gene-Expression Data Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU. 5 Genes mRNA samples sample1 sample2 sample3 sample4 sample5 … 1 0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49 0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10 0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.06 1.06 1.35 1.09 -1.09 ... Gene expression level of gene i in mRNA sample j = Log( Red intensity / Green intensity)
  • 6. A Complete workflow for Gene-Expression data analysis Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU. 6 • Workflow for real microarray gene expression data classification-
  • 7. Hierarchical Clustering Partition-Based Clustering Divisive Methods (Top - Down) Agglomerative methods (Bottom - Up) 1. Single Linkage Clustering / Nearest Neighbor Technique 2. Complete Linkage Clustering 3. Average Linkage Clustering 4. Ward's Hierarchical Clustering 5. Centroid Method 6. Median Method 7. And so on Different Classification Unsupervised classification (Clustering) Supervised classification 1.Bayes classifier. 2.Maximum likelihood classifier. 3. FLDA (Fisher Linear Discriminate Analysis) 4. SVM (Support Vector Machines) 5. Decision Trees 6. K-NN (K-Nearest Neighbors) 7. AdaBoost . 8. Robust Classifier (Proposed) 9.And so on. 1. K-Means Clustering 2. Fuzzy Clustering 3. Model Based Clustering 4. And so on Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 7
  • 8. Bayes Classifier Bayes classifier: Classify objects to a class with probability. Foundation: Based on Bayes Theorem. A short note on Bayes classifier under normal populations  Let π1 ,…, πm be m normal populations . Let {xi (k) ~ , i=1,2, …, Nk ; k=1,2, …, m} be the training data set. Objective is to classify a new data vector (or test data vector) x into one of k populations π1, … , πm . Let the prior probability of be qk which is known. Then the posterior probability of is defined by, Where, fk (x) = be the pdf of πk . ),( )( VN k p  kx  kx  ),( )( VN k p  Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 8
  • 9. Bayes Classifier (Cont…) Then the classification region Rk is defined for classifying x to the population Πk as follows: Discriminant function Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 9This is known as Bayes classifier to classify an object x to the population Πk
  • 10. Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 10 Bayes Classifier (Cont…) • Traditional Bayes procedure may produce misleading results in presence of outliers in the training dataset or test dataset or in both datasets. • To improve the results, one can replace MLEs by the robust estimators like MVE (Rousseeuw et al.,1985) , MCD (Rousseeuw et al.,1985) and OGK (Maronna and Zama 2002) estimators. • But the performance of this robust procedures are not so good in the case of high dimensional dataset. Also these estimators may not control the influence of contaminated test vector (x). • To overcome this problem, an attempt is made to Robustify the Bayes procedures by minimum β−divergence method (Mollah et al., 2007, 2010).Which is our proposed method.
  • 11. Robust Bayes classifier Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU. 11 • The minimum β-divergence estimator 𝜇 𝛽 (𝑘) and 𝑉𝛽 (𝑘) for the mean vector μ(k) and the covariance matrix V(k) respectively are obtained iteratively as follows: 𝜇 𝑟+1 (𝑘) = 𝜙 𝛽 𝒙𝑖 (𝑘) ;𝜇 𝑟 (𝑘) ,𝑉𝑟 (𝑘) 𝒙𝑖 (𝑘)𝑛 𝑘 𝑖=1 𝜙 𝛽 𝒙𝑖 (𝑘) ;𝜇 𝑟 (𝑘) ,𝑉𝑟 (𝑘)𝑛 𝑘 𝑖=1 and, 𝑉𝑟+1 (𝑘) = 𝜙 𝛽 𝒙𝑖 (𝑘) ;𝜇 𝑟 (𝑘) ,𝑉𝑟 (𝑘) 𝜓(𝒙 𝑖 (𝑘) ;𝜇 𝑟 (𝑘) ) 𝑛 𝑘 𝑖=1 𝛽+1 −1 𝜙 𝛽 𝒙𝑖 (𝑘) ;𝜇 𝑟 (𝑘) ,𝑉𝑟 (𝑘)𝑛 𝑘 𝑖=1 where, • 𝜙 𝛽 𝒙𝑖 (𝑘) ; 𝜇 𝑟 (𝑘) , 𝑉𝑟 (𝑘) = 𝑒𝑥𝑝 − 𝛽 2 (𝒙𝑖 𝑘 −𝜇 𝑟 (𝑘) ) 𝑇 𝑉𝑟 (𝑘)−1 (𝒙𝑖 𝑘 −𝜇 𝑟 (𝑘) ) is β- weight function & 𝜓(𝒙𝑖 (𝑘) ; 𝜇 𝑟 (𝑘) ) = (𝒙𝑖 𝑘 −𝜇 𝑟 (𝑘) ) (𝒙𝑖 𝑘 −𝜇 𝑟 (𝑘) ) 𝑇 • β=0, these estimators reduces to classical non-iterative estimates.
  • 12. Robust Bayes Classifier (Cont…) Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU. 12 • Step-1: First, we calculate β-weight for the test vector (x) using the β- weight function- and then we construct a criteria to test the data vector is contaminated or not as follows: • The 𝛽- weight function plays the significant role for robustification of Bayes classifier as discussed follow-
  • 13. Robust Bayes Classifier (Cont…) Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU. 13
  • 14. Robust Bayes Classifier (Cont…) 14 Step 2: : If the unclassified data vector x is contaminated by outliers, we calculate the absolute difference between the contaminated vector and each mean vector as- 𝐝 𝑘𝑖 = abs 𝒙𝑖 − 𝜇𝑖,𝛽 𝑘 ; 𝑖 = 1,2, … , 𝑝, Compute sum of the smallest r components of dk as Sk = dk(1) + dk(2) + . . . + dk(r) where r=round(p/2). Then find the tentative class or population for the unclassified data vector x as- k = 𝑎𝑟𝑔𝑚𝑖𝑛𝑆 𝑘 𝑘 Then some or all components of the unclassified contaminated data vector x corresponding to dk(r+1), dk(r+2), ... ,dk(p) are assumed to be corrupted by outliers. Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU.
  • 15. Performance Investigation of Robust Classifiers using Simulated Data Both contamination Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 15 No contamination
  • 16. Application of the Proposed Method for Gene Expression Data Analysis Gene Expression Data Generating Model NowakandTibshirani(2008) Biostatistics.9,3,467-483 Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 16
  • 17. Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease Two Class Gene Classification (Absence of Outliers) Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 17
  • 18. Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease Two Class Gene Classification (Presence of Outliers) Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 18
  • 19. Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease Three Class Gene Classification (Absence of Outliers) Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 19
  • 20. Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease Three Class Gene Classification (Absence of Outliers) Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 20
  • 21. Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease (No Contamination) Box Plot For Cancer Individuals Classification Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 21
  • 22. Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease (Train Data Contamination) Box Plot For Cancer Individuals Classification Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 22
  • 23. Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease (Test Data Contamination) Box Plot For Cancer Individuals Classification Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 23
  • 24. Performance Investigation using Simulated Gene-Expression Profile for Prediction of Cancer Disease (Both Data Contamination) Box Plot For Cancer Individuals Classification Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 24
  • 25. Real Gene Expression Data Analysis Head and Neck Cancer Data (Kuriakose et al., 2004) 12,625 genes , 22 Normal Patient, 22 Cancer Patient 594 DE Genes of 12,625 Genes, Calculated by EBarrays Method Training gene-set½ of DE Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 25
  • 26. Performance Investigation using Real Gene-Expression Profile for Prediction of Cancer Disease (In absence of outlier) Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 26
  • 27. Performance Investigation using Real Gene-Expression Profile for Prediction of Cancer Disease (In Presence of outlier) Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU. 27
  • 28. Conclusion Bayes procedure is a popular tool for classification. However, the traditional Bayes procedure is very much sensitive to outliers. So we discuss a robustification of Bayes procedure by β-divergence (Mollah et al., 2007, 2010). We compare our proposed method with some popular classification methods (SVM, KNN, AdaBoost, those are use for Microarray gene expression data analysis) using simulated datasets and we observe that the performance of our proposed method is better than all comparable methods as early mentioned. We have checked the performance of proposed method in simulated and real both gene-expression data analysis. From the above discussion simulation and real data results shows that the proposed method significantly improves the performance over the traditional Bayes methods in presence of outliers; otherwise, it keeps equal performance. Supported by HEQEP (CP-3603.R3-W2) and Bioinformatics Lab., Dept. of Statistics, RU 28
  • 29. Anderson,T.W.(2003): An Introduction to Multivariate Statistical Analysis,Wiley Interscience Johnson, R.A., Wichern, D.W. (2007): Applied multivariate statistical analysis, Sixth edition, Prentice-Hall. Mollah,M.N.H., Minami,M. and Eguchi, S. (2007): Robust prewhitening for ICA by minimizing beta- divergence and its application to FastICA. Neural processing Letters,25(2), pp. 91-110. Mollah, M.N.H.,Sultana,N., Minami, M. and Eguchi, S. (2010): Robust extraction of local structures by the minimum β-divergence method. Neural Networks, 23, pp. 226-238. Wang,S.,Gui,j. and Li,X. (2008): Factor analysis for cross-platform tumer classification based on gene expression profiles. Journal of Circuits,Systems,and Computers, 19, pp. 243-258. Wuju L. and Momiao X.(2002): Tumor classification system based on gene expression profile. Bioinformatics, 18(2): pp. 325-326. Wright G.,Tan B., Rosenwald A., Hurt E., Wiestner A. and Staudt L. (2003): A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc Natl Acad Sci USA, 2003, 100:9991-9996. Nowak, G. and Tibshirani, R. (2008) Complementary Hierarchical Clustering. Biostatistics. 9, 3, 467-483. Veer, L.J. et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530- 536. References Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, RU. 29
  • 30. Thank you for Listening. Supported by HEQEP (CP-3603.R3.W2) and Bioinformatics Lab., Dept. of Statistics, University of Rajshahi.