SlideShare a Scribd company logo
1 of 1
Download to read offline
Introduction
Study objectives
In this project, our work consists in
developing a workflow using
Knowledge Discovery and Data
Mining methodologies to propose
advanced biomarker discovery
solutions.
We propose to use machine
learning algorithms, such as support
vector machines, and random
forests to analyze metabolomic
datasets in order to identify
predictive biomarkers of metabolic
syndrome.
Comparison of these methodologies
will be performed, followed by a
graphical visualization of the
relevant features obtained by
supervised approaches. Based on
formal concept analysis
methodology, a concept lattice will
be constructed and several
association rules will be discovered
between emerging features.
- Metabolomics: Generation of
complex and massive data
noisy, variable
redundant (correlated)
heterogeneous, scalable
high number of variables compared
to the number of samples
Problematics
Knowledge Discovery based on Formal
Concept Analysis for biomarker
identification from metabolomic data
Dhouha. Grissa1,2, Jérémie Bourseau2, Blandine Comte1, Amedeo Napoli2 , Estelle Pujos-Guillot1,3
1 INRA, UMR1019, UNH-MAPPING, F-63000 Clermont-Ferrand, France
2 LORIA, B.P. 239, F-54506 Vandoeuvre-lès-Nancy, France
3 INRA, UMR1019, Plateforme d’Exploration du Métabolisme, F-63000 Clermont-Ferrand, France
ConclusionMethods
References
- Data cleaning: to remove noise and
outliers
using signal filtering methods and PCA
respectively.
- Data transformations: to remove
systematic analytical variation
using zero-mean normalization and UV
scaling.
- Features ranking: for each group
obtained by MST-kNN technique, a
ranked list of ions is produced using
classification algorithms:
1. Support Vector Machines (SVM)
[Vapnik et Chervonenkis, 1964]:
Models based on supervised learning and
decision making algorithms kernels, are used to
separate data into discrete sets. In this study, we
are interested by the weights given for each
variable.
2. Random Forests (RF) [B. Leo, 2001]:
RF are a combination of tree predictors such that
each tree depends on the values of a random
vector sampled independently and with the
same distribution for all trees in the forest. RF
can be used to rank the importance of variables
in a regression or classification problem in a
natural way.
To extract knowledge from
biological datasets and help
experts finding relevant
information, we propose here an
approach that we applied on
metabolomic data.
A combination of several
techniques was found to be
essential to discover knowledge
from complex metabolomic data,
starting from the pre-processing
of data, until the visualization and
the validation of the extracted
features, which consist in
candidate biomarkers of
metabolic syndrome.
Actually, we are applying this
workflow on a Test dataset, well
known by the experts in biology,
and preliminary results are
promising. The most relevant
metabolites have been identified
firstly by MST-kNN in the first two
groups, and afterwards
highlighted by the supervised
approaches, Random Forest and
SVM where the weight of each
feature is considered. A new
matrix with a reduced dimension,
containing only the best ranked
metabolites, is then built in order
to apply FCA to discover and
visualize relationships among
biomarker candidates. In addition
to the extraction of association
rules between emerging patterns.
In the future, we will follow the
same process within the frame of
identifying predictive biomarkers
of metabolic syndrome / type2
diabetes.
[Ganter & Wille, 99]: B. Ganter and R. Wille.
Springer, 1999.
[Arefin et al., 2014]: A. S. Arefin, R. Vimieiro, C.
Riveros, H. Craig, P. Moscato. DOI:
10.1371/journal.pone.0111445, 2014.
[B. Leo, 2001]: B. Leo. Machine Learning 45 (1):
5–32, 2001. doi:10.1023/A:1010933404324.
[Vapnik et Chervonenkis, 1964]: V. Vapnik and A.
Chervonenkis. Automation and Remote Control,
25, 1964.
[Agrawal et al., 1993]: R. Agrawal; T. Imieliński
and A. Swami. Proceedings of the 1993 ACM
SIGMOD int. conf. on Management of data -
SIGMOD '93. p. 207, 1993.
1: Diet-health Interaction Along life – Predictive
biomarkers of life trAnSitiON outcome linked to
retirement.
Contact: dhouha.grissa@clermont.inra.fr
Ions = 1195 variables
- Variables/Ions clustering:
Using MST-kNN algorithm [Arefin et al.,
2014]
MST-kNN is a partitioning algorithm
based on graphs, and different measures of
distance: euclidean, manhattan, JSD, etc.
Step 1.2 : Reduction and Feature selection
Step 1.1 : Pre-processing of data
Step 2.1 : Unsupervised approach
Step 2.2 : Supervised approach
Step 3: Visualization
- Formal Concept Analysis (FCA)
[Ganter & Wille, 1999]:
Extraction of Relationships among
Data with FCA.
Data consists of a matrix containing
the most relevant features, deduced
from the previous step.
- Association rules between emerging
features [Agrawal et al., 1993]:
...
...
- Data reduction:
Correlated data, filtering,
- Feature selection:
Support Vector Machine-Recursive Feature
Elimination (SVM-RFE)
Random-Forest, t-test.
Figure 4: Basic Working of Random Forests
I = {i1, i2, …, im}: a set of items;
Transaction Database T: a set of transactions T = {t1, t2, …, tn}
An association rule is an implication of the form : X  Y, where X, Y  I, and X Y = 
Figure 3: Basic Working of Support Vector Machines
Figure 5: Example of Concept Lattice
http://www.thebookmyproject.com/wp-content/uploads/Intrusion-Detection-Technique-by-using-K-means-Fuzzy-Neural-Network-and-SVM-
classifiers.jpg
Formal Context Specification and Extraction
Construction of
concept Lattices
K=(Ind,Ion,I)
I  Ind x Ion
Figure 2: Process of Knowledge Extraction from
metabolomics raw data
Samples
Ions = variables
Ion intensities
Figure 1: Metabolomics is a powerful phenotyping tool in
nutrition research to better understand the biological
mechanisms involved in the pathophysiological processes
and identify biomarkers of metabolic deviations.
 Need ways to extract relevant
information and ignore random
variation (noise)

More Related Content

What's hot

An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
 
IRJET- Disease Identification using Proteins Values and Regulatory Modules
IRJET-  	  Disease Identification using Proteins Values and Regulatory  ModulesIRJET-  	  Disease Identification using Proteins Values and Regulatory  Modules
IRJET- Disease Identification using Proteins Values and Regulatory ModulesIRJET Journal
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
Chemo informatics scope and applications
Chemo informatics scope and applicationsChemo informatics scope and applications
Chemo informatics scope and applicationsshyam I
 
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.PptDetecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.Pptbarthriley
 
Weighted Ensemble Classifier for Plant Leaf Identification
Weighted Ensemble Classifier for Plant Leaf IdentificationWeighted Ensemble Classifier for Plant Leaf Identification
Weighted Ensemble Classifier for Plant Leaf IdentificationTELKOMNIKA JOURNAL
 
Lecture 13 – comparative modeling
Lecture 13 – comparative modelingLecture 13 – comparative modeling
Lecture 13 – comparative modelingRAJAN ROLTA
 
Drug design based on bioinformatic tools
Drug design based on bioinformatic toolsDrug design based on bioinformatic tools
Drug design based on bioinformatic toolsSujeethKrishnan
 
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...AM Publications
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug designSurmil Shah
 
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...cscpconf
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
 
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
 
Pest Control in Agricultural Plantations Using Image Processing
Pest Control in Agricultural Plantations Using Image ProcessingPest Control in Agricultural Plantations Using Image Processing
Pest Control in Agricultural Plantations Using Image ProcessingIOSR Journals
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Alexander Decker
 
Booster in high dimensional data classification
Booster in high dimensional data classificationBooster in high dimensional data classification
Booster in high dimensional data classificationShakas Technologies
 

What's hot (20)

ABT 609 PPT
ABT 609 PPTABT 609 PPT
ABT 609 PPT
 
An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification
 
IRJET- Disease Identification using Proteins Values and Regulatory Modules
IRJET-  	  Disease Identification using Proteins Values and Regulatory  ModulesIRJET-  	  Disease Identification using Proteins Values and Regulatory  Modules
IRJET- Disease Identification using Proteins Values and Regulatory Modules
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
Chemo informatics scope and applications
Chemo informatics scope and applicationsChemo informatics scope and applications
Chemo informatics scope and applications
 
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.PptDetecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
 
Weighted Ensemble Classifier for Plant Leaf Identification
Weighted Ensemble Classifier for Plant Leaf IdentificationWeighted Ensemble Classifier for Plant Leaf Identification
Weighted Ensemble Classifier for Plant Leaf Identification
 
Lecture 13 – comparative modeling
Lecture 13 – comparative modelingLecture 13 – comparative modeling
Lecture 13 – comparative modeling
 
Drug design based on bioinformatic tools
Drug design based on bioinformatic toolsDrug design based on bioinformatic tools
Drug design based on bioinformatic tools
 
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug design
 
TBerger_FinalReport
TBerger_FinalReportTBerger_FinalReport
TBerger_FinalReport
 
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniques
 
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
 
Pest Control in Agricultural Plantations Using Image Processing
Pest Control in Agricultural Plantations Using Image ProcessingPest Control in Agricultural Plantations Using Image Processing
Pest Control in Agricultural Plantations Using Image Processing
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...
 
Using fuzzy ant colony optimization for Diagnosis of Diabetes Disease
Using fuzzy ant colony optimization for Diagnosis of Diabetes DiseaseUsing fuzzy ant colony optimization for Diagnosis of Diabetes Disease
Using fuzzy ant colony optimization for Diagnosis of Diabetes Disease
 
Booster in high dimensional data classification
Booster in high dimensional data classificationBooster in high dimensional data classification
Booster in high dimensional data classification
 
Chemoinformatic
Chemoinformatic Chemoinformatic
Chemoinformatic
 

Viewers also liked

Extreme Apprenticeship Meets Operating Systems
Extreme Apprenticeship Meets Operating SystemsExtreme Apprenticeship Meets Operating Systems
Extreme Apprenticeship Meets Operating SystemsGabriella Dodero
 
Danza de moléculas. Dance of molecules
Danza de moléculas. Dance of moleculesDanza de moléculas. Dance of molecules
Danza de moléculas. Dance of moleculesCachi Chien
 
Sentencia rechazo recurso de proteccion 3163-2016
Sentencia rechazo recurso de proteccion 3163-2016Sentencia rechazo recurso de proteccion 3163-2016
Sentencia rechazo recurso de proteccion 3163-2016Mario Varela Montero
 
Context page analysis
Context page analysisContext page analysis
Context page analysisZara Iqbal
 
ARCEP 2016 Services de communications électroniques en France
ARCEP 2016 Services de communications électroniques en FranceARCEP 2016 Services de communications électroniques en France
ARCEP 2016 Services de communications électroniques en FranceDidier Chaplault
 
Supervivencia.Survival
Supervivencia.SurvivalSupervivencia.Survival
Supervivencia.SurvivalCachi Chien
 
Alfa Company Profile
Alfa Company ProfileAlfa Company Profile
Alfa Company ProfileAlfa Company
 
CCNA-Security
CCNA-SecurityCCNA-Security
CCNA-SecurityAhmad Ali
 
Сергей Яковлев "Техническая сторона email-маркетинга"
Сергей Яковлев "Техническая сторона email-маркетинга"Сергей Яковлев "Техническая сторона email-маркетинга"
Сергей Яковлев "Техническая сторона email-маркетинга"Fwdays
 
A winter warmer!
A winter warmer!A winter warmer!
A winter warmer!wilddt
 
Михаил Боднарчук "Docker для PHP разработчиков"
Михаил Боднарчук "Docker для PHP разработчиков" Михаил Боднарчук "Docker для PHP разработчиков"
Михаил Боднарчук "Docker для PHP разработчиков" Fwdays
 

Viewers also liked (17)

Extreme Apprenticeship Meets Operating Systems
Extreme Apprenticeship Meets Operating SystemsExtreme Apprenticeship Meets Operating Systems
Extreme Apprenticeship Meets Operating Systems
 
Cgr alcoholes 1
Cgr alcoholes 1Cgr alcoholes 1
Cgr alcoholes 1
 
Danza de moléculas. Dance of molecules
Danza de moléculas. Dance of moleculesDanza de moléculas. Dance of molecules
Danza de moléculas. Dance of molecules
 
Presentation
PresentationPresentation
Presentation
 
Sakura
SakuraSakura
Sakura
 
Sentencia rechazo recurso de proteccion 3163-2016
Sentencia rechazo recurso de proteccion 3163-2016Sentencia rechazo recurso de proteccion 3163-2016
Sentencia rechazo recurso de proteccion 3163-2016
 
Context page analysis
Context page analysisContext page analysis
Context page analysis
 
ARCEP 2016 Services de communications électroniques en France
ARCEP 2016 Services de communications électroniques en FranceARCEP 2016 Services de communications électroniques en France
ARCEP 2016 Services de communications électroniques en France
 
Supervivencia.Survival
Supervivencia.SurvivalSupervivencia.Survival
Supervivencia.Survival
 
Alfa Company Profile
Alfa Company ProfileAlfa Company Profile
Alfa Company Profile
 
CCNA-Security
CCNA-SecurityCCNA-Security
CCNA-Security
 
Сергей Яковлев "Техническая сторона email-маркетинга"
Сергей Яковлев "Техническая сторона email-маркетинга"Сергей Яковлев "Техническая сторона email-маркетинга"
Сергей Яковлев "Техническая сторона email-маркетинга"
 
Clicker5
Clicker5Clicker5
Clicker5
 
A winter warmer!
A winter warmer!A winter warmer!
A winter warmer!
 
Propostas de escritura creativa.
Propostas de escritura creativa.Propostas de escritura creativa.
Propostas de escritura creativa.
 
100lecturas
100lecturas100lecturas
100lecturas
 
Михаил Боднарчук "Docker для PHP разработчиков"
Михаил Боднарчук "Docker для PHP разработчиков" Михаил Боднарчук "Docker для PHP разработчиков"
Михаил Боднарчук "Docker для PHP разработчиков"
 

Similar to Poster_JOBIM_v4.2

Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
 
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...gerogepatton
 
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...ijaia
 
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...gerogepatton
 
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATAA BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATAIJSCAI Journal
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
 
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
 
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA ijscai
 
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.Ehsan Lotfi
 
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET Journal
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Autism_risk_factors
Autism_risk_factorsAutism_risk_factors
Autism_risk_factorsColleen Chen
 
Srge most important publications 2020
Srge most important  publications 2020Srge most important  publications 2020
Srge most important publications 2020Aboul Ella Hassanien
 
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data ijscai
 
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET Journal
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...ijcsit
 

Similar to Poster_JOBIM_v4.2 (20)

Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
 
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
 
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
 
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
 
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATAA BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
 
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
 
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
 
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
 
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
 
Csit110713
Csit110713Csit110713
Csit110713
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
1207.2600
1207.26001207.2600
1207.2600
 
Autism_risk_factors
Autism_risk_factorsAutism_risk_factors
Autism_risk_factors
 
[IJCT-V3I2P26] Authors: Sunny Sharma
[IJCT-V3I2P26] Authors: Sunny Sharma[IJCT-V3I2P26] Authors: Sunny Sharma
[IJCT-V3I2P26] Authors: Sunny Sharma
 
Srge most important publications 2020
Srge most important  publications 2020Srge most important  publications 2020
Srge most important publications 2020
 
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data
 
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
 

Poster_JOBIM_v4.2

  • 1. Introduction Study objectives In this project, our work consists in developing a workflow using Knowledge Discovery and Data Mining methodologies to propose advanced biomarker discovery solutions. We propose to use machine learning algorithms, such as support vector machines, and random forests to analyze metabolomic datasets in order to identify predictive biomarkers of metabolic syndrome. Comparison of these methodologies will be performed, followed by a graphical visualization of the relevant features obtained by supervised approaches. Based on formal concept analysis methodology, a concept lattice will be constructed and several association rules will be discovered between emerging features. - Metabolomics: Generation of complex and massive data noisy, variable redundant (correlated) heterogeneous, scalable high number of variables compared to the number of samples Problematics Knowledge Discovery based on Formal Concept Analysis for biomarker identification from metabolomic data Dhouha. Grissa1,2, Jérémie Bourseau2, Blandine Comte1, Amedeo Napoli2 , Estelle Pujos-Guillot1,3 1 INRA, UMR1019, UNH-MAPPING, F-63000 Clermont-Ferrand, France 2 LORIA, B.P. 239, F-54506 Vandoeuvre-lès-Nancy, France 3 INRA, UMR1019, Plateforme d’Exploration du Métabolisme, F-63000 Clermont-Ferrand, France ConclusionMethods References - Data cleaning: to remove noise and outliers using signal filtering methods and PCA respectively. - Data transformations: to remove systematic analytical variation using zero-mean normalization and UV scaling. - Features ranking: for each group obtained by MST-kNN technique, a ranked list of ions is produced using classification algorithms: 1. Support Vector Machines (SVM) [Vapnik et Chervonenkis, 1964]: Models based on supervised learning and decision making algorithms kernels, are used to separate data into discrete sets. In this study, we are interested by the weights given for each variable. 2. Random Forests (RF) [B. Leo, 2001]: RF are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. RF can be used to rank the importance of variables in a regression or classification problem in a natural way. To extract knowledge from biological datasets and help experts finding relevant information, we propose here an approach that we applied on metabolomic data. A combination of several techniques was found to be essential to discover knowledge from complex metabolomic data, starting from the pre-processing of data, until the visualization and the validation of the extracted features, which consist in candidate biomarkers of metabolic syndrome. Actually, we are applying this workflow on a Test dataset, well known by the experts in biology, and preliminary results are promising. The most relevant metabolites have been identified firstly by MST-kNN in the first two groups, and afterwards highlighted by the supervised approaches, Random Forest and SVM where the weight of each feature is considered. A new matrix with a reduced dimension, containing only the best ranked metabolites, is then built in order to apply FCA to discover and visualize relationships among biomarker candidates. In addition to the extraction of association rules between emerging patterns. In the future, we will follow the same process within the frame of identifying predictive biomarkers of metabolic syndrome / type2 diabetes. [Ganter & Wille, 99]: B. Ganter and R. Wille. Springer, 1999. [Arefin et al., 2014]: A. S. Arefin, R. Vimieiro, C. Riveros, H. Craig, P. Moscato. DOI: 10.1371/journal.pone.0111445, 2014. [B. Leo, 2001]: B. Leo. Machine Learning 45 (1): 5–32, 2001. doi:10.1023/A:1010933404324. [Vapnik et Chervonenkis, 1964]: V. Vapnik and A. Chervonenkis. Automation and Remote Control, 25, 1964. [Agrawal et al., 1993]: R. Agrawal; T. Imieliński and A. Swami. Proceedings of the 1993 ACM SIGMOD int. conf. on Management of data - SIGMOD '93. p. 207, 1993. 1: Diet-health Interaction Along life – Predictive biomarkers of life trAnSitiON outcome linked to retirement. Contact: dhouha.grissa@clermont.inra.fr Ions = 1195 variables - Variables/Ions clustering: Using MST-kNN algorithm [Arefin et al., 2014] MST-kNN is a partitioning algorithm based on graphs, and different measures of distance: euclidean, manhattan, JSD, etc. Step 1.2 : Reduction and Feature selection Step 1.1 : Pre-processing of data Step 2.1 : Unsupervised approach Step 2.2 : Supervised approach Step 3: Visualization - Formal Concept Analysis (FCA) [Ganter & Wille, 1999]: Extraction of Relationships among Data with FCA. Data consists of a matrix containing the most relevant features, deduced from the previous step. - Association rules between emerging features [Agrawal et al., 1993]: ... ... - Data reduction: Correlated data, filtering, - Feature selection: Support Vector Machine-Recursive Feature Elimination (SVM-RFE) Random-Forest, t-test. Figure 4: Basic Working of Random Forests I = {i1, i2, …, im}: a set of items; Transaction Database T: a set of transactions T = {t1, t2, …, tn} An association rule is an implication of the form : X  Y, where X, Y  I, and X Y =  Figure 3: Basic Working of Support Vector Machines Figure 5: Example of Concept Lattice http://www.thebookmyproject.com/wp-content/uploads/Intrusion-Detection-Technique-by-using-K-means-Fuzzy-Neural-Network-and-SVM- classifiers.jpg Formal Context Specification and Extraction Construction of concept Lattices K=(Ind,Ion,I) I  Ind x Ion Figure 2: Process of Knowledge Extraction from metabolomics raw data Samples Ions = variables Ion intensities Figure 1: Metabolomics is a powerful phenotyping tool in nutrition research to better understand the biological mechanisms involved in the pathophysiological processes and identify biomarkers of metabolic deviations.  Need ways to extract relevant information and ignore random variation (noise)