SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
A comparison of learning methods to predict N2O
fluxes and N leaching
Nathalie Villa-Vialaneix
http://www.nathalievilla.org
Joined work with Marco (Follador & Ratto) and Adrian Leip (EC, Ispra,
Italy)
April, 27th, 2012 - BIA, INRA Auzeville
SAMM (Université Paris 1) &
IUT de Carcassonne (Université de Perpignan)
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 1 / 27
Sommaire
1 DNDC-Europe model description
2 Methodology
3 Results
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 2 / 27
DNDC-Europe model description
Sommaire
1 DNDC-Europe model description
2 Methodology
3 Results
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 3 / 27
DNDC-Europe model description
General overview
Modern issues in agriculture
• fight against the food crisis;
• while preserving environments.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 4 / 27
DNDC-Europe model description
General overview
Modern issues in agriculture
• fight against the food crisis;
• while preserving environments.
EC needs simulation tools to
• link the direct aids with the respect of standards ensuring proper
management;
• quantify the environmental impact of European policies (“Cross
Compliance”).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 4 / 27
DNDC-Europe model description
Cross Compliance Assessment Tool
DNDC is a biogeochemical model.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 5 / 27
DNDC-Europe model description
Zoom on DNDC-EUROPE
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 6 / 27
DNDC-Europe model description
Moving from DNDC-Europe to metamodeling
Needs for metamodeling
• easier integration into CCAT
• faster execution and responding scenario analysis
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 7 / 27
DNDC-Europe model description
Moving from DNDC-Europe to metamodeling
Needs for metamodeling
• easier integration into CCAT
• faster execution and responding scenario analysis
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 7 / 27
DNDC-Europe model description
Data [Villa-Vialaneix et al., 2012]
Data extracted from the biogeochemical simulator DNDC-EUROPE: ∼
19 000 HSMU (Homogeneous Soil Mapping Units 1km2
but the area is
quite varying) used for corn cultivation:
• corn corresponds to 4.6% of UAA;
• HSMU for which at least 10% of the agricultural land was used for
corn were selected.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 8 / 27
DNDC-Europe model description
Data [Villa-Vialaneix et al., 2012]
Data extracted from the biogeochemical simulator DNDC-EUROPE:
11 input (explanatory) variables (selected by experts and previous
simulations)
• N FR (N input through fertilization; kg/ha y);
• N MR (N input through manure spreading; kg/ha y);
• Nfix (N input from biological fixation; kg/ha y);
• Nres (N input from root residue; kg/ha y);
• BD (Bulk Density; g/cm3
);
• SOC (Soil organic carbon in topsoil; mass fraction);
• PH (Soil pH);
• Clay (Ratio of soil clay content);
• Rain (Annual precipitation; mm/y);
• Tmean (Annual mean temperature; C);
• Nr (Concentration of N in rain; ppm).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 8 / 27
DNDC-Europe model description
Data [Villa-Vialaneix et al., 2012]
Data extracted from the biogeochemical simulator DNDC-EUROPE:
2 outputs to be estimated (independently) from the inputs:
• N2O fluxes (greenhouse gaz);
• N leaching (one major cause for water pollution).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 8 / 27
Methodology
Sommaire
1 DNDC-Europe model description
2 Methodology
3 Results
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 9 / 27
Methodology
Methodology
Purpose: Comparison of several metamodeling approaches (accuracy,
computational time...).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 10 / 27
Methodology
Methodology
Purpose: Comparison of several metamodeling approaches (accuracy,
computational time...).
For every data set, every output and every method,
1 The data set was split into a training set and a test set (on a
80%/20% basis);
2 The regression function was learned from the training set (with a
full validation process for the hyperparameter tuning);
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 10 / 27
Methodology
Methodology
Purpose: Comparison of several metamodeling approaches (accuracy,
computational time...).
For every data set, every output and every method,
1 The data set was split into a training set and a test set (on a
80%/20% basis);
2 The regression function was learned from the training set (with a
full validation process for the hyperparameter tuning);
3 The performances were calculated on the basis of the test set: for
the test set, predictions were made from the inputs and compared to
the true outputs.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 10 / 27
Methodology
Methods
• 2 linear models:
• one with the 11 explanatory variables;
• one with the 11 explanatory variables plus several nonlinear
transformations of these variables (square, log...): stepwise AIC was
used to train the model;
• MLP
• SVM
• RF
• 3 approaches based on splines: ACOSSO (ANOVA splines), SDR
(improvement of the previous one) and DACE (kriging based
approach).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 11 / 27
Methodology
Regression
Consider the problem where:
• Y ∈ R has to be estimated from X ∈ Rd
;
• we are given a learning set, i.e., N i.i.d. observations of (X, Y),
(x1, y1), . . . , (xN, yN).
Example: Predict N2O fluxes from PH, climate, concentration of N in rain,
fertilization for a large number of HSMU . . .
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 12 / 27
Methodology
Multilayer perceptrons (MLP)
A “one-hidden-layer perceptron” takes the form:
Φw : x ∈ Rd
→
Q
i=1
w
(2)
i
G xT
w
(1)
i
+ w
(0)
i
+ w
(2)
0
where:
• the w are the weights of the MLP that have to be learned from the
learning set;
• G is a given activation function: typically, G(z) = 1−e−z
1+e−z ;
• Q is the number of neurons on the hidden layer. It controls the
flexibility of the MLP. Q is a hyper-parameter that is usually tuned
during the learning process.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 13 / 27
Methodology
Symbolic representation of MLP
INPUTS
x1
x2
. . .
xd
w
(1)
11
w
(1)
pQ
Neuron 1
Neuron Q
φw(x)
w
(2)
1
w
(2)
Q
+w
(0)
Q
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 14 / 27
Methodology
Learning MLP
• Learning the weights: w are learned by a mean squared error
minimization scheme :
w∗
= arg min
w
N
i=1
L(yi, Φw(xi)).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
Methodology
Learning MLP
• Learning the weights: w are learned by a mean squared error
minimization scheme penalized by a weight decay to avoid
overfitting (ensure a better generalization ability):
w∗
= arg min
w
N
i=1
L(yi, Φw(xi))+C w 2
.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
Methodology
Learning MLP
• Learning the weights: w are learned by a mean squared error
minimization scheme penalized by a weight decay to avoid
overfitting (ensure a better generalization ability):
w∗
= arg min
w
N
i=1
L(yi, Φw(xi))+C w 2
.
Problem: MSE is not quadratic in w and thus some solutions can be
local minima.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
Methodology
Learning MLP
• Learning the weights: w are learned by a mean squared error
minimization scheme penalized by a weight decay to avoid
overfitting (ensure a better generalization ability):
w∗
= arg min
w
N
i=1
L(yi, Φw(xi))+C w 2
.
Problem: MSE is not quadratic in w and thus some solutions can be
local minima.
• Tuning the hyper-parameters, C and Q: simple validation was
used to tune first C and Q.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
Methodology
SVM
SVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression: Φ(w,b) is of the form x → wT
x + b
with (w, b) solution of
arg min
N
i=1
L (yi, Φ(w,b)(xi)) + λ w 2
where
• λ is a regularization (hyper) parameter (to be tuned);
• L (y, ˆy) = max{|y − ˆy| − , 0} is an -insensitive loss function
See -insensitive loss function
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
Methodology
SVM
SVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression
2 Non linear SVM for regression are the same except that a non
linear (fixed) transformation of the inputs is previously made:
ϕ(x) ∈ H is used instead of x.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
Methodology
SVM
SVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression
2 Non linear SVM for regression are the same except that a non
linear (fixed) transformation of the inputs is previously made:
ϕ(x) ∈ H is used instead of x.
Kernel trick: in fact, ϕ is never explicit but used through a kernel,
K : Rd
× Rd
→ R. This kernel is used for K(xi, xj) = ϕ(xi)T
ϕ(xj).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
Methodology
SVM
SVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression
2 Non linear SVM for regression are the same except that a non
linear (fixed) transformation of the inputs is previously made:
ϕ(x) ∈ H is used instead of x.
Kernel trick: in fact, ϕ is never explicit but used through a kernel,
K : Rd
× Rd
→ R. This kernel is used for K(xi, xj) = ϕ(xi)T
ϕ(xj).
Common kernel: Gaussian kernel
Kγ(u, v) = e−γ u−v 2
is known to have good theoretical properties both for accuracy and
generalization.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
Methodology
Learning SVM
• Learning (w, b): w = N
i=1 αiK(xi, .) and b are calculated by an
exact optimization scheme (quadratic programming). The only step
that can be time consumming is the calculation of the kernel matrix:
K(xi, xj) for i, j = 1, . . . , N.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 17 / 27
Methodology
Learning SVM
• Learning (w, b): w = N
i=1 αiK(xi, .) and b are calculated by an
exact optimization scheme (quadratic programming). The only step
that can be time consumming is the calculation of the kernel matrix:
K(xi, xj) for i, j = 1, . . . , N.
The resulting ˆΦN
is known to be of the form:
ˆΦN
(x) =
N
i=1
αiK(xi, x) + b
where only a few αi are non zero. The corresponding xi are called
support vectors.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 17 / 27
Methodology
Learning SVM
• Learning (w, b): w = N
i=1 αiK(xi, .) and b are calculated by an
exact optimization scheme (quadratic programming). The only step
that can be time consumming is the calculation of the kernel matrix:
K(xi, xj) for i, j = 1, . . . , N.
The resulting ˆΦN
is known to be of the form:
ˆΦN
(x) =
N
i=1
αiK(xi, x) + b
where only a few αi are non zero. The corresponding xi are called
support vectors.
• Tuning of the hyper-parameters, C = 1/λ, and γ: simple
validation has been used. To limit waste of time, has not been
tuned in our experiments but set to the default value (1) which
ensured 0.5N support vectors at most.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 17 / 27
Methodology
From regression tree to random forest
Example of a regression tree
|
SOCt < 0.095
PH < 7.815
SOCt < 0.025
FR < 130.45 clay < 0.185
SOCt < 0.025
SOCt < 0.145
FR < 108.45
PH < 6.5
4.366 7.100
15.010 8.975
2.685 5.257
26.260
28.070 35.900 59.330
Each split is made such that
the two induced subsets have
the greatest homogeneity pos-
sible.
The prediction of a final node
is the mean of the Y value of
the observations belonging to
this node.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 18 / 27
Methodology
Random forest
Basic principle: combination of a large number of under-efficient
regression trees (the prediction is the mean prediction of all trees).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 19 / 27
Methodology
Random forest
Basic principle: combination of a large number of under-efficient
regression trees (the prediction is the mean prediction of all trees).
For each tree, two simplifications of the original method are performed:
1 A given number of observations are randomly chosen among the
training set: this subset of the training data set is called in-bag sample
whereas the other observations are called out-of-bag and are used to
control the error of the tree;
2 For each node of the tree, a given number of variables are randomly
chosen among all possible explanatory variables.
The best split is then calculated on the basis of these variables and of the
chosen observations. The chosen observations are the same for a given
tree whereas the variables taken into account change for each split.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 19 / 27
Methodology
Learning a random forest
Random forest are not very sensitive to hyper-parameters (number of
observations for each tree, number of variables for each split): the default
values have been used.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 20 / 27
Methodology
Learning a random forest
Random forest are not very sensitive to hyper-parameters (number of
observations for each tree, number of variables for each split): the default
values have been used.
The number of trees should be large enough for the mean squared error
based on out-of-sample observations to stabilize:
0 100 200 300 400 500
0246810
trees
Error
Out−of−bag (training)
Test
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 20 / 27
Results
Sommaire
1 DNDC-Europe model description
2 Methodology
3 Results
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 21 / 27
Results
Influence of the training sample size
5 6 7 8 9
0.50.60.70.80.91.0
N2O prediction
log size (training)
R2
LM1
LM2
Dace
SDR
ACOSSO
MLP
SVM
RF
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 22 / 27
Results
Influence of the training sample size
5 6 7 8 9
0.60.70.80.91.0
N leaching prediction
log size (training)
R2
LM1
LM2
Dace
SDR
ACOSSO
MLP
SVM
RF
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 22 / 27
Results
Computational time
Use LM1 LM2 Dace SDR Acosso
Train <1 s. 50 min 80 min 4 hours 65 min n
Prediction <1 s. <1 s. 90 s. 14 min 4 min.
Use MLP SVM RF
Train 2.5 hours 5 hours 15 min
Prediction 1 s. 20 s. 5 s.
Time for DNDC: about 200 hours with a desktop computer and about 2
days using cluster 7!
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 23 / 27
Results
Further comparisons
Evaluation of the different step (time/difficulty)
Training Validation Test
LM1 ++ +
LM2 + +
ACOSSO = + -
SDR = + -
DACE = - -
MLP - - +
SVM = - -
RF + + +
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 24 / 27
Results
Understanding which inputs are important
Importance: A measure to estimate the importance of the input variables
can be defined by:
• for a given input variable randomly permute the input values and
calculate the prediction from this new randomly permutated inputs;
• compare the accuracy of these predictions to accuracy of the
predictions obtained with the true inputs: the increase of mean
squared error is called the importance.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
Results
Understanding which inputs are important
Importance: A measure to estimate the importance of the input variables
can be defined by:
• for a given input variable randomly permute the input values and
calculate the prediction from this new randomly permutated inputs;
• compare the accuracy of these predictions to accuracy of the
predictions obtained with the true inputs: the increase of mean
squared error is called the importance.
This comparison is made on the basis of data that are not used to define
the machine, either the validation set or the out-of-bag observations.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
Results
Understanding which inputs are important
Example (N2O, RF):
q
q
q
q
q
q
q
q
q
q
q
2 4 6 8 10
51015202530
Rank
Importance(meandecreaseMSE)
pH
Nr N_MR
Nfix
N_FR
clay NresTmean BD rain
SOC
The variables SOC and PH are the most important for accurate
predictions.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
Results
Understanding which inputs are important
Example (N leaching, SVM):
q
q
q q
q
q
q
q
q q
q
2 4 6 8 10
050010001500
Rank
Importance(decreaseMSE)
N_FR
Nres pH
Nr
clay
rain
SOC
Tmean Nfix
BD
N_MR
The variables N_MR, N_FR, Nres and pH are the most important for
accurate predictions.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
Results
Thank you for your attention
Any questions?
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 26 / 27
Results
Villa-Vialaneix, N., Follador, M., Ratto, M., and Leip, A. (2012).
A comparison of eight metamodeling techniques for the simulation of
n2o fluxes and n leaching from corn crops.
Environmental Modelling and Software, 34:51–66.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 27 / 27
Results
-insensitive loss function
Go back
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 27 / 27

Contenu connexe

Similaire à Comparison of learning methods to predict N2O fluxes and N leaching

A comparison of learning methods to predict N2O fluxes and N leaching
A comparison of learning methods to predict N2O fluxes and N leachingA comparison of learning methods to predict N2O fluxes and N leaching
A comparison of learning methods to predict N2O fluxes and N leachingtuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...IJORCS
 
Scalable Dynamic Graph Summarization
Scalable Dynamic Graph SummarizationScalable Dynamic Graph Summarization
Scalable Dynamic Graph SummarizationIoanna Tsalouchidou
 
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...Turi, Inc.
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Daniel Valcarce
 
Regression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachRegression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachKhulna University
 
Course Title: Introduction to Machine Learning, Chapter 2- Supervised Learning
Course Title: Introduction to Machine Learning,  Chapter 2- Supervised LearningCourse Title: Introduction to Machine Learning,  Chapter 2- Supervised Learning
Course Title: Introduction to Machine Learning, Chapter 2- Supervised LearningShumet Tadesse
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Rafael Nogueras
 
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORINGMACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING VisionGEOMATIQUE2014
 
Simulation-based Optimization of a Real-world Travelling Salesman Problem Usi...
Simulation-based Optimization of a Real-world Travelling Salesman Problem Usi...Simulation-based Optimization of a Real-world Travelling Salesman Problem Usi...
Simulation-based Optimization of a Real-world Travelling Salesman Problem Usi...CSCJournals
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIRtuxette
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackarogozhnikov
 
Knowledge-based generalization for metabolic models
Knowledge-based generalization for metabolic modelsKnowledge-based generalization for metabolic models
Knowledge-based generalization for metabolic modelsAnna Zhukova
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learningbutest
 
Sampling-Based Planning Algorithms for Multi-Objective Missions
Sampling-Based Planning Algorithms for Multi-Objective MissionsSampling-Based Planning Algorithms for Multi-Objective Missions
Sampling-Based Planning Algorithms for Multi-Objective MissionsMd Mahbubur Rahman
 
2012 05-10 kaiser
2012 05-10 kaiser2012 05-10 kaiser
2012 05-10 kaiserSCEE Team
 

Similaire à Comparison of learning methods to predict N2O fluxes and N leaching (20)

A comparison of learning methods to predict N2O fluxes and N leaching
A comparison of learning methods to predict N2O fluxes and N leachingA comparison of learning methods to predict N2O fluxes and N leaching
A comparison of learning methods to predict N2O fluxes and N leaching
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
 
Scalable Dynamic Graph Summarization
Scalable Dynamic Graph SummarizationScalable Dynamic Graph Summarization
Scalable Dynamic Graph Summarization
 
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
Seattle.Slides.7
Seattle.Slides.7Seattle.Slides.7
Seattle.Slides.7
 
Regression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachRegression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network Approach
 
Course Title: Introduction to Machine Learning, Chapter 2- Supervised Learning
Course Title: Introduction to Machine Learning,  Chapter 2- Supervised LearningCourse Title: Introduction to Machine Learning,  Chapter 2- Supervised Learning
Course Title: Introduction to Machine Learning, Chapter 2- Supervised Learning
 
I stata
I stataI stata
I stata
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
 
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORINGMACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
 
Simulation-based Optimization of a Real-world Travelling Salesman Problem Usi...
Simulation-based Optimization of a Real-world Travelling Salesman Problem Usi...Simulation-based Optimization of a Real-world Travelling Salesman Problem Usi...
Simulation-based Optimization of a Real-world Travelling Salesman Problem Usi...
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
Knowledge-based generalization for metabolic models
Knowledge-based generalization for metabolic modelsKnowledge-based generalization for metabolic models
Knowledge-based generalization for metabolic models
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
Sampling-Based Planning Algorithms for Multi-Objective Missions
Sampling-Based Planning Algorithms for Multi-Objective MissionsSampling-Based Planning Algorithms for Multi-Objective Missions
Sampling-Based Planning Algorithms for Multi-Objective Missions
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
 
2012 05-10 kaiser
2012 05-10 kaiser2012 05-10 kaiser
2012 05-10 kaiser
 

Plus de tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 

Plus de tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Dernier

The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 

Dernier (20)

The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 

Comparison of learning methods to predict N2O fluxes and N leaching

  • 1. A comparison of learning methods to predict N2O fluxes and N leaching Nathalie Villa-Vialaneix http://www.nathalievilla.org Joined work with Marco (Follador & Ratto) and Adrian Leip (EC, Ispra, Italy) April, 27th, 2012 - BIA, INRA Auzeville SAMM (Université Paris 1) & IUT de Carcassonne (Université de Perpignan) Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 1 / 27
  • 2. Sommaire 1 DNDC-Europe model description 2 Methodology 3 Results Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 2 / 27
  • 3. DNDC-Europe model description Sommaire 1 DNDC-Europe model description 2 Methodology 3 Results Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 3 / 27
  • 4. DNDC-Europe model description General overview Modern issues in agriculture • fight against the food crisis; • while preserving environments. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 4 / 27
  • 5. DNDC-Europe model description General overview Modern issues in agriculture • fight against the food crisis; • while preserving environments. EC needs simulation tools to • link the direct aids with the respect of standards ensuring proper management; • quantify the environmental impact of European policies (“Cross Compliance”). Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 4 / 27
  • 6. DNDC-Europe model description Cross Compliance Assessment Tool DNDC is a biogeochemical model. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 5 / 27
  • 7. DNDC-Europe model description Zoom on DNDC-EUROPE Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 6 / 27
  • 8. DNDC-Europe model description Moving from DNDC-Europe to metamodeling Needs for metamodeling • easier integration into CCAT • faster execution and responding scenario analysis Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 7 / 27
  • 9. DNDC-Europe model description Moving from DNDC-Europe to metamodeling Needs for metamodeling • easier integration into CCAT • faster execution and responding scenario analysis Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 7 / 27
  • 10. DNDC-Europe model description Data [Villa-Vialaneix et al., 2012] Data extracted from the biogeochemical simulator DNDC-EUROPE: ∼ 19 000 HSMU (Homogeneous Soil Mapping Units 1km2 but the area is quite varying) used for corn cultivation: • corn corresponds to 4.6% of UAA; • HSMU for which at least 10% of the agricultural land was used for corn were selected. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 8 / 27
  • 11. DNDC-Europe model description Data [Villa-Vialaneix et al., 2012] Data extracted from the biogeochemical simulator DNDC-EUROPE: 11 input (explanatory) variables (selected by experts and previous simulations) • N FR (N input through fertilization; kg/ha y); • N MR (N input through manure spreading; kg/ha y); • Nfix (N input from biological fixation; kg/ha y); • Nres (N input from root residue; kg/ha y); • BD (Bulk Density; g/cm3 ); • SOC (Soil organic carbon in topsoil; mass fraction); • PH (Soil pH); • Clay (Ratio of soil clay content); • Rain (Annual precipitation; mm/y); • Tmean (Annual mean temperature; C); • Nr (Concentration of N in rain; ppm). Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 8 / 27
  • 12. DNDC-Europe model description Data [Villa-Vialaneix et al., 2012] Data extracted from the biogeochemical simulator DNDC-EUROPE: 2 outputs to be estimated (independently) from the inputs: • N2O fluxes (greenhouse gaz); • N leaching (one major cause for water pollution). Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 8 / 27
  • 13. Methodology Sommaire 1 DNDC-Europe model description 2 Methodology 3 Results Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 9 / 27
  • 14. Methodology Methodology Purpose: Comparison of several metamodeling approaches (accuracy, computational time...). Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 10 / 27
  • 15. Methodology Methodology Purpose: Comparison of several metamodeling approaches (accuracy, computational time...). For every data set, every output and every method, 1 The data set was split into a training set and a test set (on a 80%/20% basis); 2 The regression function was learned from the training set (with a full validation process for the hyperparameter tuning); Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 10 / 27
  • 16. Methodology Methodology Purpose: Comparison of several metamodeling approaches (accuracy, computational time...). For every data set, every output and every method, 1 The data set was split into a training set and a test set (on a 80%/20% basis); 2 The regression function was learned from the training set (with a full validation process for the hyperparameter tuning); 3 The performances were calculated on the basis of the test set: for the test set, predictions were made from the inputs and compared to the true outputs. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 10 / 27
  • 17. Methodology Methods • 2 linear models: • one with the 11 explanatory variables; • one with the 11 explanatory variables plus several nonlinear transformations of these variables (square, log...): stepwise AIC was used to train the model; • MLP • SVM • RF • 3 approaches based on splines: ACOSSO (ANOVA splines), SDR (improvement of the previous one) and DACE (kriging based approach). Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 11 / 27
  • 18. Methodology Regression Consider the problem where: • Y ∈ R has to be estimated from X ∈ Rd ; • we are given a learning set, i.e., N i.i.d. observations of (X, Y), (x1, y1), . . . , (xN, yN). Example: Predict N2O fluxes from PH, climate, concentration of N in rain, fertilization for a large number of HSMU . . . Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 12 / 27
  • 19. Methodology Multilayer perceptrons (MLP) A “one-hidden-layer perceptron” takes the form: Φw : x ∈ Rd → Q i=1 w (2) i G xT w (1) i + w (0) i + w (2) 0 where: • the w are the weights of the MLP that have to be learned from the learning set; • G is a given activation function: typically, G(z) = 1−e−z 1+e−z ; • Q is the number of neurons on the hidden layer. It controls the flexibility of the MLP. Q is a hyper-parameter that is usually tuned during the learning process. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 13 / 27
  • 20. Methodology Symbolic representation of MLP INPUTS x1 x2 . . . xd w (1) 11 w (1) pQ Neuron 1 Neuron Q φw(x) w (2) 1 w (2) Q +w (0) Q Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 14 / 27
  • 21. Methodology Learning MLP • Learning the weights: w are learned by a mean squared error minimization scheme : w∗ = arg min w N i=1 L(yi, Φw(xi)). Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
  • 22. Methodology Learning MLP • Learning the weights: w are learned by a mean squared error minimization scheme penalized by a weight decay to avoid overfitting (ensure a better generalization ability): w∗ = arg min w N i=1 L(yi, Φw(xi))+C w 2 . Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
  • 23. Methodology Learning MLP • Learning the weights: w are learned by a mean squared error minimization scheme penalized by a weight decay to avoid overfitting (ensure a better generalization ability): w∗ = arg min w N i=1 L(yi, Φw(xi))+C w 2 . Problem: MSE is not quadratic in w and thus some solutions can be local minima. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
  • 24. Methodology Learning MLP • Learning the weights: w are learned by a mean squared error minimization scheme penalized by a weight decay to avoid overfitting (ensure a better generalization ability): w∗ = arg min w N i=1 L(yi, Φw(xi))+C w 2 . Problem: MSE is not quadratic in w and thus some solutions can be local minima. • Tuning the hyper-parameters, C and Q: simple validation was used to tune first C and Q. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
  • 25. Methodology SVM SVM is also an algorithm based on penalized error loss minimization: 1 Basic linear SVM for regression: Φ(w,b) is of the form x → wT x + b with (w, b) solution of arg min N i=1 L (yi, Φ(w,b)(xi)) + λ w 2 where • λ is a regularization (hyper) parameter (to be tuned); • L (y, ˆy) = max{|y − ˆy| − , 0} is an -insensitive loss function See -insensitive loss function Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
  • 26. Methodology SVM SVM is also an algorithm based on penalized error loss minimization: 1 Basic linear SVM for regression 2 Non linear SVM for regression are the same except that a non linear (fixed) transformation of the inputs is previously made: ϕ(x) ∈ H is used instead of x. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
  • 27. Methodology SVM SVM is also an algorithm based on penalized error loss minimization: 1 Basic linear SVM for regression 2 Non linear SVM for regression are the same except that a non linear (fixed) transformation of the inputs is previously made: ϕ(x) ∈ H is used instead of x. Kernel trick: in fact, ϕ is never explicit but used through a kernel, K : Rd × Rd → R. This kernel is used for K(xi, xj) = ϕ(xi)T ϕ(xj). Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
  • 28. Methodology SVM SVM is also an algorithm based on penalized error loss minimization: 1 Basic linear SVM for regression 2 Non linear SVM for regression are the same except that a non linear (fixed) transformation of the inputs is previously made: ϕ(x) ∈ H is used instead of x. Kernel trick: in fact, ϕ is never explicit but used through a kernel, K : Rd × Rd → R. This kernel is used for K(xi, xj) = ϕ(xi)T ϕ(xj). Common kernel: Gaussian kernel Kγ(u, v) = e−γ u−v 2 is known to have good theoretical properties both for accuracy and generalization. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
  • 29. Methodology Learning SVM • Learning (w, b): w = N i=1 αiK(xi, .) and b are calculated by an exact optimization scheme (quadratic programming). The only step that can be time consumming is the calculation of the kernel matrix: K(xi, xj) for i, j = 1, . . . , N. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 17 / 27
  • 30. Methodology Learning SVM • Learning (w, b): w = N i=1 αiK(xi, .) and b are calculated by an exact optimization scheme (quadratic programming). The only step that can be time consumming is the calculation of the kernel matrix: K(xi, xj) for i, j = 1, . . . , N. The resulting ˆΦN is known to be of the form: ˆΦN (x) = N i=1 αiK(xi, x) + b where only a few αi are non zero. The corresponding xi are called support vectors. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 17 / 27
  • 31. Methodology Learning SVM • Learning (w, b): w = N i=1 αiK(xi, .) and b are calculated by an exact optimization scheme (quadratic programming). The only step that can be time consumming is the calculation of the kernel matrix: K(xi, xj) for i, j = 1, . . . , N. The resulting ˆΦN is known to be of the form: ˆΦN (x) = N i=1 αiK(xi, x) + b where only a few αi are non zero. The corresponding xi are called support vectors. • Tuning of the hyper-parameters, C = 1/λ, and γ: simple validation has been used. To limit waste of time, has not been tuned in our experiments but set to the default value (1) which ensured 0.5N support vectors at most. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 17 / 27
  • 32. Methodology From regression tree to random forest Example of a regression tree | SOCt < 0.095 PH < 7.815 SOCt < 0.025 FR < 130.45 clay < 0.185 SOCt < 0.025 SOCt < 0.145 FR < 108.45 PH < 6.5 4.366 7.100 15.010 8.975 2.685 5.257 26.260 28.070 35.900 59.330 Each split is made such that the two induced subsets have the greatest homogeneity pos- sible. The prediction of a final node is the mean of the Y value of the observations belonging to this node. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 18 / 27
  • 33. Methodology Random forest Basic principle: combination of a large number of under-efficient regression trees (the prediction is the mean prediction of all trees). Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 19 / 27
  • 34. Methodology Random forest Basic principle: combination of a large number of under-efficient regression trees (the prediction is the mean prediction of all trees). For each tree, two simplifications of the original method are performed: 1 A given number of observations are randomly chosen among the training set: this subset of the training data set is called in-bag sample whereas the other observations are called out-of-bag and are used to control the error of the tree; 2 For each node of the tree, a given number of variables are randomly chosen among all possible explanatory variables. The best split is then calculated on the basis of these variables and of the chosen observations. The chosen observations are the same for a given tree whereas the variables taken into account change for each split. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 19 / 27
  • 35. Methodology Learning a random forest Random forest are not very sensitive to hyper-parameters (number of observations for each tree, number of variables for each split): the default values have been used. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 20 / 27
  • 36. Methodology Learning a random forest Random forest are not very sensitive to hyper-parameters (number of observations for each tree, number of variables for each split): the default values have been used. The number of trees should be large enough for the mean squared error based on out-of-sample observations to stabilize: 0 100 200 300 400 500 0246810 trees Error Out−of−bag (training) Test Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 20 / 27
  • 37. Results Sommaire 1 DNDC-Europe model description 2 Methodology 3 Results Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 21 / 27
  • 38. Results Influence of the training sample size 5 6 7 8 9 0.50.60.70.80.91.0 N2O prediction log size (training) R2 LM1 LM2 Dace SDR ACOSSO MLP SVM RF Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 22 / 27
  • 39. Results Influence of the training sample size 5 6 7 8 9 0.60.70.80.91.0 N leaching prediction log size (training) R2 LM1 LM2 Dace SDR ACOSSO MLP SVM RF Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 22 / 27
  • 40. Results Computational time Use LM1 LM2 Dace SDR Acosso Train <1 s. 50 min 80 min 4 hours 65 min n Prediction <1 s. <1 s. 90 s. 14 min 4 min. Use MLP SVM RF Train 2.5 hours 5 hours 15 min Prediction 1 s. 20 s. 5 s. Time for DNDC: about 200 hours with a desktop computer and about 2 days using cluster 7! Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 23 / 27
  • 41. Results Further comparisons Evaluation of the different step (time/difficulty) Training Validation Test LM1 ++ + LM2 + + ACOSSO = + - SDR = + - DACE = - - MLP - - + SVM = - - RF + + + Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 24 / 27
  • 42. Results Understanding which inputs are important Importance: A measure to estimate the importance of the input variables can be defined by: • for a given input variable randomly permute the input values and calculate the prediction from this new randomly permutated inputs; • compare the accuracy of these predictions to accuracy of the predictions obtained with the true inputs: the increase of mean squared error is called the importance. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
  • 43. Results Understanding which inputs are important Importance: A measure to estimate the importance of the input variables can be defined by: • for a given input variable randomly permute the input values and calculate the prediction from this new randomly permutated inputs; • compare the accuracy of these predictions to accuracy of the predictions obtained with the true inputs: the increase of mean squared error is called the importance. This comparison is made on the basis of data that are not used to define the machine, either the validation set or the out-of-bag observations. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
  • 44. Results Understanding which inputs are important Example (N2O, RF): q q q q q q q q q q q 2 4 6 8 10 51015202530 Rank Importance(meandecreaseMSE) pH Nr N_MR Nfix N_FR clay NresTmean BD rain SOC The variables SOC and PH are the most important for accurate predictions. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
  • 45. Results Understanding which inputs are important Example (N leaching, SVM): q q q q q q q q q q q 2 4 6 8 10 050010001500 Rank Importance(decreaseMSE) N_FR Nres pH Nr clay rain SOC Tmean Nfix BD N_MR The variables N_MR, N_FR, Nres and pH are the most important for accurate predictions. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
  • 46. Results Thank you for your attention Any questions? Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 26 / 27
  • 47. Results Villa-Vialaneix, N., Follador, M., Ratto, M., and Leip, A. (2012). A comparison of eight metamodeling techniques for the simulation of n2o fluxes and n leaching from corn crops. Environmental Modelling and Software, 34:51–66. Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 27 / 27
  • 48. Results -insensitive loss function Go back Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 27 / 27