SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Representation of metabolomic data with wavelets
Nathalie Villa-Vialaneix
http://www.nathalievilla.org
Toulouse School of Economics
Workgroup BioPuces, INRA de Castanet
June 5th, 2009
BioPuces (05/06/09) Nathalie Villa Metabolomic data 1 / 16
Sommaire
1 Database presentation
2 Wavelet representation
3 Perspective of work
BioPuces (05/06/09) Nathalie Villa Metabolomic data 2 / 16
Database presentation
Sommaire
1 Database presentation
2 Wavelet representation
3 Perspective of work
BioPuces (05/06/09) Nathalie Villa Metabolomic data 3 / 16
Database presentation
Basics about the data base
The database was given by Alain Paris (INRA) and consists of
metabolomic registration (H NMR) from urine of mice.
950 variables from 0.505 ppm to 9.995 ppm.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 4 / 16
Database presentation
Basics about the data base
The database was given by Alain Paris (INRA) and consists of
metabolomic registration (H NMR) from urine of mice.
950 variables from 0.505 ppm to 9.995 ppm.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 4 / 16
Database presentation
Basics about the data base
The database was given by Alain Paris (INRA) and consists of
metabolomic registration (H NMR) from urine of mice.
950 variables from 0.505 ppm to 9.995 ppm.
Baseline has been removed and peaks have been aligned.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 4 / 16
Database presentation
Purpose of the work
Study the effects of the ingestion of Hypochoeris radicata (HR) on the
metabolism: the inflorescences of this plant are known to be responsible
for a horse desease, the Australian stringhalt.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 5 / 16
Database presentation
Purpose of the work
Study the effects of the ingestion of Hypochoeris radicata (HR) on the
metabolism: the inflorescences of this plant are known to be responsible
for a horse desease, the Australian stringhalt.
As it is hard to obtain several dizains of horses to kill them, the
experiments have been conducted on 72 mice.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 5 / 16
Database presentation
Description of the experiment
72 mice from:
2 sexes 36 males 36 females
BioPuces (05/06/09) Nathalie Villa Metabolomic data 6 / 16
Database presentation
Description of the experiment
72 mice from:
2 sexes 36 males 36 females
3 kinds of HR doses 0 (control) : 24 mice 3%: 24 mice 9%: 24 mice
BioPuces (05/06/09) Nathalie Villa Metabolomic data 6 / 16
Database presentation
Description of the experiment
72 mice from:
2 sexes 36 males 36 females
3 kinds of HR doses 0 (control) : 24 mice 3%: 24 mice 9%: 24 mice
3 sacrifice dates 8th day: 24 mice 15th: 24 mice 21st: 24 mice
BioPuces (05/06/09) Nathalie Villa Metabolomic data 6 / 16
Database presentation
Description of the experiment
72 mice from:
2 sexes 36 males 36 females
3 kinds of HR doses 0 (control) : 24 mice 3%: 24 mice 9%: 24 mice
3 sacrifice dates 8th day: 24 mice 15th: 24 mice 21st: 24 mice
⇒ 18 groups.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 6 / 16
Database presentation
Measurements days
The urine was collected:
Days 0 1 4 8 11 15 18 21
Nb of observations 68 68 68 66 46 44 19 18
BioPuces (05/06/09) Nathalie Villa Metabolomic data 7 / 16
Database presentation
Measurements days
The urine was collected:
Days 0 1 4 8 11 15 18 21
Nb of observations 68 68 68 66 46 44 19 18
For each mice, from 2 to 22 measurements are made.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 7 / 16
Database presentation
Measurements days
The urine was collected:
Days 0 1 4 8 11 15 18 21
Nb of observations 68 68 68 66 46 44 19 18
For each mice, from 2 to 22 measurements are made.
In conclusion, 397 observations for 950 variables.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 7 / 16
Wavelet representation
Sommaire
1 Database presentation
2 Wavelet representation
3 Perspective of work
BioPuces (05/06/09) Nathalie Villa Metabolomic data 8 / 16
Wavelet representation
Basic principle of wavelets
For a given J integer, the spectra can be expressed at level J as:
f(x) =
k
αk 2−J/2
Ψ(2−J
x − k) +
J
j=1 k
βjk 2−j/2
Φ 2−j
x − k
BioPuces (05/06/09) Nathalie Villa Metabolomic data 9 / 16
Wavelet representation
Basic principle of wavelets
For a given J integer, the spectra can be expressed at level J as:
f(x) =
k
αk 2−J/2
Ψ(2−J
x − k)
Trend: based on the father wavelet Ψ
+
J
j=1 k
βjk 2−j/2
Φ 2−j
x − k
BioPuces (05/06/09) Nathalie Villa Metabolomic data 9 / 16
Wavelet representation
Basic principle of wavelets
For a given J integer, the spectra can be expressed at level J as:
f(x) =
k
αk 2−J/2
Ψ(2−J
x − k)
Trend: based on the father wavelet Ψ
+
J
j=1 k
βjk 2−j/2
Φ 2−j
x − k
Details at levels 1,...,J: based on the mother wavelet Φ
BioPuces (05/06/09) Nathalie Villa Metabolomic data 9 / 16
Wavelet representation
Hierarchical decomposition
We add 74 zero values at the end of the spectra to have a diadic discrete
sampling.
Original Data: f observed at t1 ... t1024 equally spaced
BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
Wavelet representation
Hierarchical decomposition
We add 74 zero values at the end of the spectra to have a diadic discrete
sampling.
Original Data: f observed at t1 ... t1024 equally spaced
↓
Level 1 Trend Details
BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
Wavelet representation
Hierarchical decomposition
We add 74 zero values at the end of the spectra to have a diadic discrete
sampling.
Original Data: f observed at t1 ... t1024 equally spaced
↓
Level 1 Trend Details
↓
Level 2 Trend Details
BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
Wavelet representation
Hierarchical decomposition
We add 74 zero values at the end of the spectra to have a diadic discrete
sampling.
Original Data: f observed at t1 ... t1024 equally spaced
↓
Level 1 Trend Details
↓
Level 2 Trend Details
. . .
↓
Level 9 Trend Details
BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
Wavelet representation
Hierarchical decomposition
We add 74 zero values at the end of the spectra to have a diadic discrete
sampling.
Original Data: f observed at t1 ... t1024 equally spaced
↓
Level 1 Trend Details
↓
Level 2 Trend Details
. . .
↓
Level 9 Trend Details
⇒ At level 9 (maximum level with 1024 length discrete sampling), we
obtain 1025 coefficients.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
Wavelet representation
Examples
Trend Details
BioPuces (05/06/09) Nathalie Villa Metabolomic data 11 / 16
Wavelet representation
Denoising
For coefficients corresponding to details greater than J (with J large
enough), a filtering is made:
c∗
=
0 if |c| < 2 log 10ˆσ
c if |c| ≥ 2 log 10ˆσ
(Donoho and Johnstone)
BioPuces (05/06/09) Nathalie Villa Metabolomic data 12 / 16
Wavelet representation
Denoising
For coefficients corresponding to details greater than J (with J large
enough), a filtering is made:
c∗
=
0 if |c| < 2 log 10ˆσ
c if |c| ≥ 2 log 10ˆσ
(Donoho and Johnstone)
Two parameters are to be tuned:
• Which wavelet has to be used?
• Which J has to be used?
to make a trade-off between quality of the reconstruction of the function
(what are the values on the functions built on the the basis of the filtered
coefficients?) and the number of non negative coefficients.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 12 / 16
Wavelet representation
Denoising
For coefficients corresponding to details greater than J (with J large
enough), a filtering is made:
c∗
=
0 if |c| < 2 log 10ˆσ
c if |c| ≥ 2 log 10ˆσ
(Donoho and Johnstone)
Two parameters are to be tuned:
• Which wavelet has to be used?
• Which J has to be used?
to make a trade-off between quality of the reconstruction of the function
(what are the values on the functions built on the the basis of the filtered
coefficients?) and the number of non negative coefficients.
Minimization of an empirical (self-created) quality criterium:
1
n
i
1
D
j
fi(tj) − ˆfi(tj)
2
+
Nb of non negative coefficients
Nb of coefficients
BioPuces (05/06/09) Nathalie Villa Metabolomic data 12 / 16
Wavelet representation
Final reconstruction of the data
274 positive coefficients
BioPuces (05/06/09) Nathalie Villa Metabolomic data 13 / 16
Wavelet representation
Boxplots
Original coefficients
BioPuces (05/06/09) Nathalie Villa Metabolomic data 14 / 16
Wavelet representation
Boxplots
Scaled coefficients (reduction by mean and standard deviation)
BioPuces (05/06/09) Nathalie Villa Metabolomic data 14 / 16
Perspective of work
Sommaire
1 Database presentation
2 Wavelet representation
3 Perspective of work
BioPuces (05/06/09) Nathalie Villa Metabolomic data 15 / 16
Perspective of work
Using random forests
The idea is to use random forest to make prediction and also extract the
main coefficients responsible for the explanation of the target variables.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 16 / 16
Perspective of work
Using random forests
The idea is to use random forest to make prediction and also extract the
main coefficients responsible for the explanation of the target variables.
Proposed regression: the scale coefficients will be the explanatory
variables. The variable of interest could be:
• the dose (either as a number or as a class leading to a classification
problem);
• the total dose injected (i.e., the dose multiplied by the number of
days of ingestion);
• any other interesting idea?
BioPuces (05/06/09) Nathalie Villa Metabolomic data 16 / 16
Perspective of work
Using random forests
The idea is to use random forest to make prediction and also extract the
main coefficients responsible for the explanation of the target variables.
Proposed regression: the scale coefficients will be the explanatory
variables. The variable of interest could be:
• the dose (either as a number or as a class leading to a classification
problem);
• the total dose injected (i.e., the dose multiplied by the number of
days of ingestion);
• any other interesting idea?
The idea is to rebuilt the individuals from the main coefficients (putting the
others to zero) to see which peaks are different from one group to the
others.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 16 / 16

Contenu connexe

En vedette

Multiple kernel Self-Organizing Maps
Multiple kernel Self-Organizing MapsMultiple kernel Self-Organizing Maps
Multiple kernel Self-Organizing Mapstuxette
 
Mining co-expression network
Mining co-expression networkMining co-expression network
Mining co-expression networktuxette
 
Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"tuxette
 
Graph mining with kernel self-organizing map
Graph mining with kernel self-organizing mapGraph mining with kernel self-organizing map
Graph mining with kernel self-organizing maptuxette
 
Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional datatuxette
 
Several nonlinear models and methods for FDA
Several nonlinear models and methods for FDASeveral nonlinear models and methods for FDA
Several nonlinear models and methods for FDAtuxette
 
Large network analysis : visualization and clustering
Large network analysis : visualization and clusteringLarge network analysis : visualization and clustering
Large network analysis : visualization and clusteringtuxette
 
FDA and Statistical learning theory
FDA and Statistical learning theoryFDA and Statistical learning theory
FDA and Statistical learning theorytuxette
 
A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learningtuxette
 
Topographic graph clustering with kernel and dissimilarity methods
Topographic graph clustering with kernel and dissimilarity methodsTopographic graph clustering with kernel and dissimilarity methods
Topographic graph clustering with kernel and dissimilarity methodstuxette
 

En vedette (12)

Multiple kernel Self-Organizing Maps
Multiple kernel Self-Organizing MapsMultiple kernel Self-Organizing Maps
Multiple kernel Self-Organizing Maps
 
Mining co-expression network
Mining co-expression networkMining co-expression network
Mining co-expression network
 
Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"
 
Graph mining with kernel self-organizing map
Graph mining with kernel self-organizing mapGraph mining with kernel self-organizing map
Graph mining with kernel self-organizing map
 
Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional data
 
Several nonlinear models and methods for FDA
Several nonlinear models and methods for FDASeveral nonlinear models and methods for FDA
Several nonlinear models and methods for FDA
 
Large network analysis : visualization and clustering
Large network analysis : visualization and clusteringLarge network analysis : visualization and clustering
Large network analysis : visualization and clustering
 
FDA and Statistical learning theory
FDA and Statistical learning theoryFDA and Statistical learning theory
FDA and Statistical learning theory
 
A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
Topographic graph clustering with kernel and dissimilarity methods
Topographic graph clustering with kernel and dissimilarity methodsTopographic graph clustering with kernel and dissimilarity methods
Topographic graph clustering with kernel and dissimilarity methods
 
Classroom arrangement
Classroom arrangementClassroom arrangement
Classroom arrangement
 
Classroom arrangement
Classroom arrangementClassroom arrangement
Classroom arrangement
 

Similaire à Representation of metabolomic data with wavelets

Metabolomic data: combining wavelet representation with learning approaches
Metabolomic data: combining wavelet representation with learning approachesMetabolomic data: combining wavelet representation with learning approaches
Metabolomic data: combining wavelet representation with learning approachestuxette
 
Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...
Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...
Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...tuxette
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016ExternalEvents
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldJoe Parker
 
When Biology Meets Computer Science
When Biology Meets Computer ScienceWhen Biology Meets Computer Science
When Biology Meets Computer ScienceGeeks Anonymes
 
Workflows supporting drug discovery against malaria
Workflows supporting drug discovery against malariaWorkflows supporting drug discovery against malaria
Workflows supporting drug discovery against malariaBarry Hardy
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeFrancisco Zamora-Martinez
 
Data Assimilation for the Lorenz (1963) Model using Ensemble and Extended Kal...
Data Assimilation for the Lorenz (1963) Model using Ensemble and Extended Kal...Data Assimilation for the Lorenz (1963) Model using Ensemble and Extended Kal...
Data Assimilation for the Lorenz (1963) Model using Ensemble and Extended Kal...Claudia Vitolo
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisAntica Culina
 
Open Data and Ecological and Evolutionary synthesis
Open Data and Ecological and Evolutionary synthesisOpen Data and Ecological and Evolutionary synthesis
Open Data and Ecological and Evolutionary synthesisAntica Culina
 
Computing Bayesian posterior with empirical likelihood in population genetics
Computing Bayesian posterior with empirical likelihood in population geneticsComputing Bayesian posterior with empirical likelihood in population genetics
Computing Bayesian posterior with empirical likelihood in population geneticsPierre Pudlo
 
Multiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetsMultiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetstuxette
 
Hybrid bat-ant colony optimization algorithm for rule-based feature selection...
Hybrid bat-ant colony optimization algorithm for rule-based feature selection...Hybrid bat-ant colony optimization algorithm for rule-based feature selection...
Hybrid bat-ant colony optimization algorithm for rule-based feature selection...IJECEIAES
 
Internal examination 3rd semester disaster
Internal examination 3rd semester disasterInternal examination 3rd semester disaster
Internal examination 3rd semester disasterMahendra Poudel
 
American Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk UniversityAmerican Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk Universitymcdonadt
 
Integrating Tara Oceans datasets using unsupervised multiple kernel learning
Integrating Tara Oceans datasets using unsupervised multiple kernel learningIntegrating Tara Oceans datasets using unsupervised multiple kernel learning
Integrating Tara Oceans datasets using unsupervised multiple kernel learningtuxette
 
On the Modeling of the Three Types of Non-Spiking Neurons of the Caenorhabdit...
On the Modeling of the Three Types of Non-Spiking Neurons of the Caenorhabdit...On the Modeling of the Three Types of Non-Spiking Neurons of the Caenorhabdit...
On the Modeling of the Three Types of Non-Spiking Neurons of the Caenorhabdit...Juan Luis Jiménez Laredo
 

Similaire à Representation of metabolomic data with wavelets (20)

Metabolomic data: combining wavelet representation with learning approaches
Metabolomic data: combining wavelet representation with learning approachesMetabolomic data: combining wavelet representation with learning approaches
Metabolomic data: combining wavelet representation with learning approaches
 
Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...
Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...
Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' world
 
When Biology Meets Computer Science
When Biology Meets Computer ScienceWhen Biology Meets Computer Science
When Biology Meets Computer Science
 
Workflows supporting drug discovery against malaria
Workflows supporting drug discovery against malariaWorkflows supporting drug discovery against malaria
Workflows supporting drug discovery against malaria
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
 
Data Assimilation for the Lorenz (1963) Model using Ensemble and Extended Kal...
Data Assimilation for the Lorenz (1963) Model using Ensemble and Extended Kal...Data Assimilation for the Lorenz (1963) Model using Ensemble and Extended Kal...
Data Assimilation for the Lorenz (1963) Model using Ensemble and Extended Kal...
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
 
Open Data and Ecological and Evolutionary synthesis
Open Data and Ecological and Evolutionary synthesisOpen Data and Ecological and Evolutionary synthesis
Open Data and Ecological and Evolutionary synthesis
 
Computing Bayesian posterior with empirical likelihood in population genetics
Computing Bayesian posterior with empirical likelihood in population geneticsComputing Bayesian posterior with empirical likelihood in population genetics
Computing Bayesian posterior with empirical likelihood in population genetics
 
ERVA-NMR
ERVA-NMRERVA-NMR
ERVA-NMR
 
Multiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetsMultiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasets
 
Hybrid bat-ant colony optimization algorithm for rule-based feature selection...
Hybrid bat-ant colony optimization algorithm for rule-based feature selection...Hybrid bat-ant colony optimization algorithm for rule-based feature selection...
Hybrid bat-ant colony optimization algorithm for rule-based feature selection...
 
Internal examination 3rd semester disaster
Internal examination 3rd semester disasterInternal examination 3rd semester disaster
Internal examination 3rd semester disaster
 
C04821220
C04821220C04821220
C04821220
 
American Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk UniversityAmerican Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk University
 
Integrating Tara Oceans datasets using unsupervised multiple kernel learning
Integrating Tara Oceans datasets using unsupervised multiple kernel learningIntegrating Tara Oceans datasets using unsupervised multiple kernel learning
Integrating Tara Oceans datasets using unsupervised multiple kernel learning
 
On the Modeling of the Three Types of Non-Spiking Neurons of the Caenorhabdit...
On the Modeling of the Three Types of Non-Spiking Neurons of the Caenorhabdit...On the Modeling of the Three Types of Non-Spiking Neurons of the Caenorhabdit...
On the Modeling of the Three Types of Non-Spiking Neurons of the Caenorhabdit...
 
LiveSense
LiveSenseLiveSense
LiveSense
 

Plus de tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 

Plus de tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Dernier

WOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptxWOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptxpadhand000
 
the Husband rolesBrown Aesthetic Cute Group Project Presentation
the Husband rolesBrown Aesthetic Cute Group Project Presentationthe Husband rolesBrown Aesthetic Cute Group Project Presentation
the Husband rolesBrown Aesthetic Cute Group Project Presentationbrynpueblos04
 
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)Delhi Call girls
 
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)Delhi Call girls
 
Pokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy TheoryPokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy Theorydrae5
 
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)Delhi Call girls
 
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...PsychicRuben LoveSpells
 
LC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfLC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfpastor83
 
KLINIK BATA Jual obat penggugur kandungan 087776558899 ABORSI JANIN KEHAMILAN...
KLINIK BATA Jual obat penggugur kandungan 087776558899 ABORSI JANIN KEHAMILAN...KLINIK BATA Jual obat penggugur kandungan 087776558899 ABORSI JANIN KEHAMILAN...
KLINIK BATA Jual obat penggugur kandungan 087776558899 ABORSI JANIN KEHAMILAN...Cara Menggugurkan Kandungan 087776558899
 
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morcall Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morvikas rana
 
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)Delhi Call girls
 

Dernier (14)

WOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptxWOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptx
 
the Husband rolesBrown Aesthetic Cute Group Project Presentation
the Husband rolesBrown Aesthetic Cute Group Project Presentationthe Husband rolesBrown Aesthetic Cute Group Project Presentation
the Husband rolesBrown Aesthetic Cute Group Project Presentation
 
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
 
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
 
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
 
Pokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy TheoryPokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy Theory
 
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
 
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
 
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
 
LC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfLC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdf
 
KLINIK BATA Jual obat penggugur kandungan 087776558899 ABORSI JANIN KEHAMILAN...
KLINIK BATA Jual obat penggugur kandungan 087776558899 ABORSI JANIN KEHAMILAN...KLINIK BATA Jual obat penggugur kandungan 087776558899 ABORSI JANIN KEHAMILAN...
KLINIK BATA Jual obat penggugur kandungan 087776558899 ABORSI JANIN KEHAMILAN...
 
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morcall Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
 
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
 

Representation of metabolomic data with wavelets

  • 1. Representation of metabolomic data with wavelets Nathalie Villa-Vialaneix http://www.nathalievilla.org Toulouse School of Economics Workgroup BioPuces, INRA de Castanet June 5th, 2009 BioPuces (05/06/09) Nathalie Villa Metabolomic data 1 / 16
  • 2. Sommaire 1 Database presentation 2 Wavelet representation 3 Perspective of work BioPuces (05/06/09) Nathalie Villa Metabolomic data 2 / 16
  • 3. Database presentation Sommaire 1 Database presentation 2 Wavelet representation 3 Perspective of work BioPuces (05/06/09) Nathalie Villa Metabolomic data 3 / 16
  • 4. Database presentation Basics about the data base The database was given by Alain Paris (INRA) and consists of metabolomic registration (H NMR) from urine of mice. 950 variables from 0.505 ppm to 9.995 ppm. BioPuces (05/06/09) Nathalie Villa Metabolomic data 4 / 16
  • 5. Database presentation Basics about the data base The database was given by Alain Paris (INRA) and consists of metabolomic registration (H NMR) from urine of mice. 950 variables from 0.505 ppm to 9.995 ppm. BioPuces (05/06/09) Nathalie Villa Metabolomic data 4 / 16
  • 6. Database presentation Basics about the data base The database was given by Alain Paris (INRA) and consists of metabolomic registration (H NMR) from urine of mice. 950 variables from 0.505 ppm to 9.995 ppm. Baseline has been removed and peaks have been aligned. BioPuces (05/06/09) Nathalie Villa Metabolomic data 4 / 16
  • 7. Database presentation Purpose of the work Study the effects of the ingestion of Hypochoeris radicata (HR) on the metabolism: the inflorescences of this plant are known to be responsible for a horse desease, the Australian stringhalt. BioPuces (05/06/09) Nathalie Villa Metabolomic data 5 / 16
  • 8. Database presentation Purpose of the work Study the effects of the ingestion of Hypochoeris radicata (HR) on the metabolism: the inflorescences of this plant are known to be responsible for a horse desease, the Australian stringhalt. As it is hard to obtain several dizains of horses to kill them, the experiments have been conducted on 72 mice. BioPuces (05/06/09) Nathalie Villa Metabolomic data 5 / 16
  • 9. Database presentation Description of the experiment 72 mice from: 2 sexes 36 males 36 females BioPuces (05/06/09) Nathalie Villa Metabolomic data 6 / 16
  • 10. Database presentation Description of the experiment 72 mice from: 2 sexes 36 males 36 females 3 kinds of HR doses 0 (control) : 24 mice 3%: 24 mice 9%: 24 mice BioPuces (05/06/09) Nathalie Villa Metabolomic data 6 / 16
  • 11. Database presentation Description of the experiment 72 mice from: 2 sexes 36 males 36 females 3 kinds of HR doses 0 (control) : 24 mice 3%: 24 mice 9%: 24 mice 3 sacrifice dates 8th day: 24 mice 15th: 24 mice 21st: 24 mice BioPuces (05/06/09) Nathalie Villa Metabolomic data 6 / 16
  • 12. Database presentation Description of the experiment 72 mice from: 2 sexes 36 males 36 females 3 kinds of HR doses 0 (control) : 24 mice 3%: 24 mice 9%: 24 mice 3 sacrifice dates 8th day: 24 mice 15th: 24 mice 21st: 24 mice ⇒ 18 groups. BioPuces (05/06/09) Nathalie Villa Metabolomic data 6 / 16
  • 13. Database presentation Measurements days The urine was collected: Days 0 1 4 8 11 15 18 21 Nb of observations 68 68 68 66 46 44 19 18 BioPuces (05/06/09) Nathalie Villa Metabolomic data 7 / 16
  • 14. Database presentation Measurements days The urine was collected: Days 0 1 4 8 11 15 18 21 Nb of observations 68 68 68 66 46 44 19 18 For each mice, from 2 to 22 measurements are made. BioPuces (05/06/09) Nathalie Villa Metabolomic data 7 / 16
  • 15. Database presentation Measurements days The urine was collected: Days 0 1 4 8 11 15 18 21 Nb of observations 68 68 68 66 46 44 19 18 For each mice, from 2 to 22 measurements are made. In conclusion, 397 observations for 950 variables. BioPuces (05/06/09) Nathalie Villa Metabolomic data 7 / 16
  • 16. Wavelet representation Sommaire 1 Database presentation 2 Wavelet representation 3 Perspective of work BioPuces (05/06/09) Nathalie Villa Metabolomic data 8 / 16
  • 17. Wavelet representation Basic principle of wavelets For a given J integer, the spectra can be expressed at level J as: f(x) = k αk 2−J/2 Ψ(2−J x − k) + J j=1 k βjk 2−j/2 Φ 2−j x − k BioPuces (05/06/09) Nathalie Villa Metabolomic data 9 / 16
  • 18. Wavelet representation Basic principle of wavelets For a given J integer, the spectra can be expressed at level J as: f(x) = k αk 2−J/2 Ψ(2−J x − k) Trend: based on the father wavelet Ψ + J j=1 k βjk 2−j/2 Φ 2−j x − k BioPuces (05/06/09) Nathalie Villa Metabolomic data 9 / 16
  • 19. Wavelet representation Basic principle of wavelets For a given J integer, the spectra can be expressed at level J as: f(x) = k αk 2−J/2 Ψ(2−J x − k) Trend: based on the father wavelet Ψ + J j=1 k βjk 2−j/2 Φ 2−j x − k Details at levels 1,...,J: based on the mother wavelet Φ BioPuces (05/06/09) Nathalie Villa Metabolomic data 9 / 16
  • 20. Wavelet representation Hierarchical decomposition We add 74 zero values at the end of the spectra to have a diadic discrete sampling. Original Data: f observed at t1 ... t1024 equally spaced BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
  • 21. Wavelet representation Hierarchical decomposition We add 74 zero values at the end of the spectra to have a diadic discrete sampling. Original Data: f observed at t1 ... t1024 equally spaced ↓ Level 1 Trend Details BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
  • 22. Wavelet representation Hierarchical decomposition We add 74 zero values at the end of the spectra to have a diadic discrete sampling. Original Data: f observed at t1 ... t1024 equally spaced ↓ Level 1 Trend Details ↓ Level 2 Trend Details BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
  • 23. Wavelet representation Hierarchical decomposition We add 74 zero values at the end of the spectra to have a diadic discrete sampling. Original Data: f observed at t1 ... t1024 equally spaced ↓ Level 1 Trend Details ↓ Level 2 Trend Details . . . ↓ Level 9 Trend Details BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
  • 24. Wavelet representation Hierarchical decomposition We add 74 zero values at the end of the spectra to have a diadic discrete sampling. Original Data: f observed at t1 ... t1024 equally spaced ↓ Level 1 Trend Details ↓ Level 2 Trend Details . . . ↓ Level 9 Trend Details ⇒ At level 9 (maximum level with 1024 length discrete sampling), we obtain 1025 coefficients. BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
  • 25. Wavelet representation Examples Trend Details BioPuces (05/06/09) Nathalie Villa Metabolomic data 11 / 16
  • 26. Wavelet representation Denoising For coefficients corresponding to details greater than J (with J large enough), a filtering is made: c∗ = 0 if |c| < 2 log 10ˆσ c if |c| ≥ 2 log 10ˆσ (Donoho and Johnstone) BioPuces (05/06/09) Nathalie Villa Metabolomic data 12 / 16
  • 27. Wavelet representation Denoising For coefficients corresponding to details greater than J (with J large enough), a filtering is made: c∗ = 0 if |c| < 2 log 10ˆσ c if |c| ≥ 2 log 10ˆσ (Donoho and Johnstone) Two parameters are to be tuned: • Which wavelet has to be used? • Which J has to be used? to make a trade-off between quality of the reconstruction of the function (what are the values on the functions built on the the basis of the filtered coefficients?) and the number of non negative coefficients. BioPuces (05/06/09) Nathalie Villa Metabolomic data 12 / 16
  • 28. Wavelet representation Denoising For coefficients corresponding to details greater than J (with J large enough), a filtering is made: c∗ = 0 if |c| < 2 log 10ˆσ c if |c| ≥ 2 log 10ˆσ (Donoho and Johnstone) Two parameters are to be tuned: • Which wavelet has to be used? • Which J has to be used? to make a trade-off between quality of the reconstruction of the function (what are the values on the functions built on the the basis of the filtered coefficients?) and the number of non negative coefficients. Minimization of an empirical (self-created) quality criterium: 1 n i 1 D j fi(tj) − ˆfi(tj) 2 + Nb of non negative coefficients Nb of coefficients BioPuces (05/06/09) Nathalie Villa Metabolomic data 12 / 16
  • 29. Wavelet representation Final reconstruction of the data 274 positive coefficients BioPuces (05/06/09) Nathalie Villa Metabolomic data 13 / 16
  • 30. Wavelet representation Boxplots Original coefficients BioPuces (05/06/09) Nathalie Villa Metabolomic data 14 / 16
  • 31. Wavelet representation Boxplots Scaled coefficients (reduction by mean and standard deviation) BioPuces (05/06/09) Nathalie Villa Metabolomic data 14 / 16
  • 32. Perspective of work Sommaire 1 Database presentation 2 Wavelet representation 3 Perspective of work BioPuces (05/06/09) Nathalie Villa Metabolomic data 15 / 16
  • 33. Perspective of work Using random forests The idea is to use random forest to make prediction and also extract the main coefficients responsible for the explanation of the target variables. BioPuces (05/06/09) Nathalie Villa Metabolomic data 16 / 16
  • 34. Perspective of work Using random forests The idea is to use random forest to make prediction and also extract the main coefficients responsible for the explanation of the target variables. Proposed regression: the scale coefficients will be the explanatory variables. The variable of interest could be: • the dose (either as a number or as a class leading to a classification problem); • the total dose injected (i.e., the dose multiplied by the number of days of ingestion); • any other interesting idea? BioPuces (05/06/09) Nathalie Villa Metabolomic data 16 / 16
  • 35. Perspective of work Using random forests The idea is to use random forest to make prediction and also extract the main coefficients responsible for the explanation of the target variables. Proposed regression: the scale coefficients will be the explanatory variables. The variable of interest could be: • the dose (either as a number or as a class leading to a classification problem); • the total dose injected (i.e., the dose multiplied by the number of days of ingestion); • any other interesting idea? The idea is to rebuilt the individuals from the main coefficients (putting the others to zero) to see which peaks are different from one group to the others. BioPuces (05/06/09) Nathalie Villa Metabolomic data 16 / 16