DSUS_MAO_2012_Jie

Uncertainty Quantification in Surrogate Models Based
on Pattern Classification of Cross-validation Errors
Jie Zhang*, Souma Chowdhury*, Ali Mehmani# and Achille Messac#
* Rensselaer Polytechnic Institute, Department of Mechanical, Aerospace, and Nuclear Engineering
# Syracuse University, Department of Mechanical and Aerospace Engineering
14th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference
September 17 – 19, 2012
Indianapolis, Indiana

Uncertainty in Surrogate Modeling
• Since a surrogate model is an approximation to an unknown
function, prediction errors are generally present in the
estimated function values.
• The two major sources of uncertainty in surrogate modeling
are:
• Uncertainty in the observations (when they are noisy), and
• Uncertainty due to finite sample.
• One of the major challenges in surrogate modeling is to
accurately quantify these uncertainties.
2

Research Question
3
Domain
Segmentation
based on
Uncertainty in
the Surrogate
Can we segregate the input space of a
surrogate, based on the accuracy of the
surrogate in different regions, and
characterize the uncertainty in each
region?
• By addressing this question, we can:
Quantify the uncertainty in the surrogate, which is
applicable to a majority of surrogate models.

Research Motivation
 Surrogate model can be used with more confidence if
we can do two things:
 Quantify the uncertainty in the surrogate model; and
 Characterize how the levels of errors vary in the variable
space.
 Most existing methods to model the uncertainty in
surrogate are model-dependent.
4

Research Questions: Corollary Objectives
5
 The research question calls for a methodology to enhance
user confidence in surrogate applications.
• Develop a methodology to characterize the uncertainty
attributable to surrogate models, which is applicable to
both regression and interpolative surrogate models.
• Evaluate the performance of leave-one-out cross-validation
errors as local error measures.

Why Domain Segmentation?
 In surrogate-based optimization: optimal solutions in regions with smaller
errors are more reliable than solutions in regions with larger errors. The
domain segmentation technique can quantify the uncertainty in the optimal
solutions based on their locations in the design space.
Wind Farm Layout Optimization
 In surrogate-based system analysis: the knowledge of the errors and
uncertainties in the surrogate is helpful for decision making by the
user/engineer.
Wind Farm Cost Model
 In surrogate accuracy improvement: the design-domain-based uncertainty
6
information can be used to implement adaptive sampling strategies.

Presentation Outline
 Uncertainty in surrogate overview
 Domain Segmentation based on Uncertainty in the Surrogate
(DSUS)
 Illustrating cross-validation errors as local error measures
 Applications to wind resource assessment and wind farm
cost modeling
 Concluding remarks
7

Uncertainty in Surrogate Review
8
 Uncertainty in Surrogates
 Bayesian approach (Kennedy and O’Hagan, Apley et al., Xiong et al.)
 Adding bias using constant or pointwise margins (Picheny)
 Reliability Based Design Optimization (RBDO) (Neufeld et al.)
 Efficient Global Optimization (EGO) (Jones et al.)
 Sequential Kriging Optimization (SKO) (Huang et al.)
 Importation of uncertainty estimates from one surrogate to another
(Viana and Haftka)

Domain Segmentation based on Uncertainty in the Surrogate (DSUS)
9
• Based on the current level of
knowledge regarding the problem,
the engineer may know what
levels of errors are acceptable for
particular design purposes.
Wind Farm Power Generation Model
Error Decision
< 5% error Desirable
5-10% error Acceptable
> 10% error Unacceptable
• The engineer can estimate the
confidence of a new design, based on
the region into which the design
point is classified.
• These regions can correspond to
“good”, “acceptable”, and
“unacceptable” levels of accuracy.

Development of the DSUS
10
DSUS Key features:
 Segregates the design domain into
regions based on the level of errors
(or level of fidelity).
 Classifies any new point/design,
for which the actual functional
response is not known, into an
error class, and quantify the
uncertainty in its predicted
function response.
 Is readily applicable to a majority
of interpolative surrogate models.
The term "prediction uncertainty" denotes the
distribution of errors of the surrogate.

Cross-Validation
 Two popular strategies: (i) leave-one-out; and (ii) q-fold.
 In order to obtain the error at each training point, the leave-one-
out strategy is adopted in the DSUS framework.
 The Relative Accuracy Error (RAE) is used to classify the
training points into classes.
11
Actual
function value
Estimated value
by surrogate

Classifying the Training Points into Error Classes
• According to the RAE values, we classify the training points into error
classes, and define the lower and upper bounds of each class.
12
Class Design variable RAE
1 0.573 0.018
2 0.277 0.691
1 0.044 0.045
1 0.371 0.018
2 0.910 0.345
2 0.767 0.116
1 0.720 0.043
1 0.865 0.060
1 0.508 0.073
2 0.637 0.316
2 0.977 1.078
1 0.240 0.013
1 0.184 0.019
1 0.107 0.004
1 0.453 0.049
Class 1
RAE < 10%
Class 2
RAE > 10%

Pattern Classification
 A wide variety of pattern classification methods are available:
 Linear discriminant analysis (LDA);
 Principal components analysis (PCA);
 Kernel estimation and K-nearest-neighbor algorithms;
 Perceptrons;
 Neural Network; and
 Support Vector Machine (SVM) 1,2.
a competitive approach for multiclass
13
classification problem
1. Basudhar and Missoum; 2. Sakalkar and Hajela

Support Vector Machine (SVM)
 Four kernels are popularly used:
 Linear:
 Polynomial:
 Radial basis function:
 Sigmoid:
14
We have used an efficient SVM package, LIBSVM (A
Library for Support Vector Machines), developed by Chang
and Lin.
One-against-one
classification

Adaptive Hybrid Functions (AHF)
 Determination of a trust region:
numerical bounds of the estimated
parameter (output) as functions of
the independent parameters (input).
 Definition of a local measure of
accuracy (using kernel functions) of
the estimated function value, and
representation of the corresponding
distribution parameters as functions
of the input vector.
 Weighted summation of different
surrogate models based on the local
measure of accuracy.
16

Illustrating Cross-validation Errors as Local Error Measures
17
 The local errors of the surrogate are evaluated in the neighborhood of each
training point.
 A local hypercube is constructed to include one training point.
 The length of the hypercube along each dimension is determined by
 The jth hypercube can be expressed by

Illustrating Cross-validation Errors as Local Error Measures
18
 The RAE at each test point within the jth local hypercube is given by
 For the jth local hypercube, the average of the RAE values for points is given by
The local errors estimated by the leave-one-out surrogate at each
training point are compared with the RAEte
value estimated within the
local hypercube.

Analytical Examples
19
 Three analytical examples are tested.
 The 1-variable function;
 The 2-variable Dixon & Price function; and
 The 2-variable Booth function.
 The leave-one-out cross-validation errors and the actual local errors
in the local hypercube are specified as
Percentage Error w.r.t the mean error in the entire
domain
Class 1 <50%
Class 2 50-100 %
Class 3 100-150%
Class 4 >150%

Analytical Examples
20
 Three analytical examples are tested.
 The 1-variable function;
 The 2-variable Dixon & Price function; and
 The 2-variable Booth function.
Leave-one-out
Cross-validation
Local hypercube

Results of Local Error Measures
21
1-variable function
 The cross-validation errors and the actual local errors belong to the
same class for 99.33% of the 15 training points.

22
Dixon & Price function
Actual local errors (local hypercube) Cross-validation errors

23
Booth function
Actual local errors (local hypercube) Cross-validation errors

Wind Energy Case Studies
 We apply the DSUS framework to key aspects of wind resource assessment
and wind farm cost modeling.
 Onshore wind farm cost model; and
 Wind Power Potential (WPP) model.
24
 Response Surface-Based Wind Farm Cost (RS-WFC) model
 The inputs for the surrogate model are
 The number, and
 The rated power of wind turbines.
 The output of the surrogate model is
 Total annual cost of a wind farm
Surrogate:

Wind Power Potential (WPP)
 The WPP method predicts the quality of wind resources by considering
the joint distribution of wind speed and wind direction, which can help
decision makers in wind farm siting. The key steps include:
1. Determining distribution type and parameters; and The 5-parameter
bivariate normal distribution is adopted.
2. Sampling the five distribution parameters;
3. Maximizing the net power generation through farm layout optimization
for each sample distribution; and
4. A surrogate model is constructed to represent the computed maximum
capacity factor as a function of the parameters of the bivariate normal
distribution.
 The uncertainty in the WPP is characterized for two cases:
 Evaluating the WPP for a four-turbine farm; and
 Evaluating the WPP for a nine-turbine farm
25

SVM Kernels and Predefined Classes Bounds
26
Numerical setup for test problems
The uncertainty scale in each class

Representation of Prediction Uncertainty
• Gaussian distribution is adopted to represent the uncertainty in the
prediction accuracy of the surrogate.
• For any new point (design) candidate, the DSUS framework can classify
that point into one of these error classes.
27
Wind farm cost model

Uncertainty Prediction Results
28
Classification accuracy of each problem
Problem Parameters Accuracy
Wind farm cost C=1, γ=0.8 100% (20/20)
WPP (4 turbines) C=1, γ=1 90% (18/20)
WPP (9 turbines) C=1, γ=0.2 95% (19/20)
• The classification accuracy of the DSUS prediction is more than 90% for
all problems.

Uncertainty Prediction Results
29
Uncertainty characterization for new designs (wind farm cost)
New design No. of turbines Rated
Power
Class Uncertainty (μ, σ)
1 40 1.25 MW 1 0.0018, 0.0011
2 7 1 MW 2 0.0066, 0.0013
3 44 1.5 MW 3 0.0216, 0.0102
 The total annual cost per kilowatt installed (for the thisd wind farm) is
estimated as 122.28 $/kW.
 Assuming a lifetime of 20 years, the 2.16% (mean value) error in the cost
estimation is approximately 3.5 million dollars, which is an appreciable
value for such a medium scale wind farm.

Uncertainty in The Wind Power Potential
30
Uncertainty in the estimated capacity factors
 For the Ada station, the capacity factor is estimated as 48.52% for the 9-
turbine farm.
 The 53.74% (mean value) error in the capacity factor estimation results in
approximately 3×107 kWh annual energy production, which is significant
for such a small scale wind farm.

Uncertainty in The Wind Power Potential
31
WPP with 4 turbines WPP with 9 turbines

Concluding Remarks
• The Domain Segmentation based on Uncertainty in the Surrogate
(DSUS) framework could successfully characterize the uncertainty
attributable to surrogate models.
• The mean errors in the wind farm cost and wind power potential
(in the case studies) are significant for small/medium scale wind
farms, which should be carefully considered during the decision
making process.
• The results show that the leave-one-out cross-validation error can
capture the local errors of a surrogate with a reasonable accuracy.
• Future research should investigate other error metrics that better
represent the performance over the entire design domain
32

Acknowledgement
• I would like to acknowledge my research adviser
Prof. Achille Messac, for his immense help and
support in this research.
• I would also like to thank my colleagues Souma
Chowdhury and Ali Mehmani for their valuable
contributions to this paper.
• Support from the NSF Awards is also
acknowledged.
33

Selected References
1. Keane, A. J. and Nair, P. B., Computational Approaches for Aerospace Design: The Pursuit of Excellence, John Wiley and Sons,
2005.
2. Basudhar, A. and Missoum, S., “Adaptive Explicit Decision Functions for Probabilistic Design and Optimization Using Support
Vector Machines,” Computers and Structures, Vol. 86, No. 19-20, 2008, pp. 1904–1917.
3. Forrester, A. and Keane, A., “Recent Advances in Surrogate-based Optimization,” Progress in Aerospace Sciences, Vol. 45, No.
1-3, 2009, pp. 50–79.
4. Wang, G. and Shan, S., “Review of Metamodeling Techniques in Support of Engineering Design Optimization,” Journal of
Mechanical Design, Vol. 129, No. 4, 2007, pp. 370–380.
5. Simpson, T., Toropov, V., Balabanov, V., , and Viana, F., “Design and Analysis of Computer Experiments in Multidisciplinary
Design Optimization: A Review of How Far We Have Come or Not,” 12th AIAA/ISSMO Multidisciplinary Analysis and
Optimization Conference, Victoria, Canada, September 10-12 2008.
6. Zhang, J., Chowdhury, S., and Messac, A., “An Adaptive Hybrid Surrogate Model,” Structural and Multidisciplinary
Optimization, 2012, doi: 10.1007/s00158-012-0764-x.
7. Forrester, A., Sobester, A., and Keane, A., Engineering Design via Surrogate Modelling: A Practical Guide, Wiley, 2008.
8. Apley, D. W., Liu, J., and Chen, W., “Understanding the Effects of Model Uncertainty in Robust Design With Computer
Experiments,” ASME Journal of Mechanical Design, Vol. 128, No. 4, 2006, pp. 945(14 pages).
9. Neufeld, D. and an J. Chung, K. B., “Aircraft Wing Box Optimization Considering Uncertainty in Surrogate Models,” Structural
and Multidisciplinary Optimization, Vol. 42, No. 5, 2010, pp. 745–753.
10. Viana, F. A. C. and Haftka, R. T., “Importing Uncertainty Estimates from One Surrogate to Another,” 50th
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Palm Springs, California, May 4-6
2009.
34

DSUS_MAO_2012_Jie

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (17)

En vedette

En vedette (16)

Similaire à DSUS_MAO_2012_Jie

Similaire à DSUS_MAO_2012_Jie (20)

Plus de MDO_Lab

Plus de MDO_Lab (20)

Dernier

Dernier (20)

DSUS_MAO_2012_Jie

Notes de l'éditeur