This paper advances the Domain Segmentation based on Uncertainty in the Surrogate (DSUS) framework which is a novel approach to characterize the uncertainty in surrogates. The leave-one-out cross-validation technique is adopted in the DSUS framework to measure local errors of a surrogate. A method is proposed in this paper to evaluate the performance of the leave-out-out cross-validation errors as local error measures. This method evaluates local errors by comparing: (i) the leave-one-out cross-validation error with (ii) the actual local error estimated within a local hypercube for each training point. The comparison results show that the leave-one-out cross-validation strategy can capture the local errors of a surrogate. The DSUS framework is then applied to key aspects of wind resource as- sessment and wind farm cost modeling. The uncertainties in the wind farm cost and the wind power potential are successfully characterized, which provides designers/users more confidence when using these models
Disha NEET Physics Guide for classes 11 and 12.pdf
DSUS_MAO_2012_Jie
1. Uncertainty Quantification in Surrogate Models Based
on Pattern Classification of Cross-validation Errors
Jie Zhang*, Souma Chowdhury*, Ali Mehmani# and Achille Messac#
* Rensselaer Polytechnic Institute, Department of Mechanical, Aerospace, and Nuclear Engineering
# Syracuse University, Department of Mechanical and Aerospace Engineering
14th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference
September 17 – 19, 2012
Indianapolis, Indiana
2. Uncertainty in Surrogate Modeling
• Since a surrogate model is an approximation to an unknown
function, prediction errors are generally present in the
estimated function values.
• The two major sources of uncertainty in surrogate modeling
are:
• Uncertainty in the observations (when they are noisy), and
• Uncertainty due to finite sample.
• One of the major challenges in surrogate modeling is to
accurately quantify these uncertainties.
2
3. Research Question
3
Domain
Segmentation
based on
Uncertainty in
the Surrogate
Can we segregate the input space of a
surrogate, based on the accuracy of the
surrogate in different regions, and
characterize the uncertainty in each
region?
• By addressing this question, we can:
Quantify the uncertainty in the surrogate, which is
applicable to a majority of surrogate models.
4. Research Motivation
Surrogate model can be used with more confidence if
we can do two things:
Quantify the uncertainty in the surrogate model; and
Characterize how the levels of errors vary in the variable
space.
Most existing methods to model the uncertainty in
surrogate are model-dependent.
4
5. Research Questions: Corollary Objectives
5
The research question calls for a methodology to enhance
user confidence in surrogate applications.
• Develop a methodology to characterize the uncertainty
attributable to surrogate models, which is applicable to
both regression and interpolative surrogate models.
• Evaluate the performance of leave-one-out cross-validation
errors as local error measures.
6. Why Domain Segmentation?
In surrogate-based optimization: optimal solutions in regions with smaller
errors are more reliable than solutions in regions with larger errors. The
domain segmentation technique can quantify the uncertainty in the optimal
solutions based on their locations in the design space.
Wind Farm Layout Optimization
In surrogate-based system analysis: the knowledge of the errors and
uncertainties in the surrogate is helpful for decision making by the
user/engineer.
Wind Farm Cost Model
In surrogate accuracy improvement: the design-domain-based uncertainty
6
information can be used to implement adaptive sampling strategies.
7. Presentation Outline
Uncertainty in surrogate overview
Domain Segmentation based on Uncertainty in the Surrogate
(DSUS)
Illustrating cross-validation errors as local error measures
Applications to wind resource assessment and wind farm
cost modeling
Concluding remarks
7
8. Uncertainty in Surrogate Review
8
Uncertainty in Surrogates
Bayesian approach (Kennedy and O’Hagan, Apley et al., Xiong et al.)
Adding bias using constant or pointwise margins (Picheny)
Reliability Based Design Optimization (RBDO) (Neufeld et al.)
Efficient Global Optimization (EGO) (Jones et al.)
Sequential Kriging Optimization (SKO) (Huang et al.)
Importation of uncertainty estimates from one surrogate to another
(Viana and Haftka)
9. Domain Segmentation based on Uncertainty in the Surrogate (DSUS)
9
• Based on the current level of
knowledge regarding the problem,
the engineer may know what
levels of errors are acceptable for
particular design purposes.
Wind Farm Power Generation Model
Error Decision
< 5% error Desirable
5-10% error Acceptable
> 10% error Unacceptable
• The engineer can estimate the
confidence of a new design, based on
the region into which the design
point is classified.
• These regions can correspond to
“good”, “acceptable”, and
“unacceptable” levels of accuracy.
10. Development of the DSUS
10
DSUS Key features:
Segregates the design domain into
regions based on the level of errors
(or level of fidelity).
Classifies any new point/design,
for which the actual functional
response is not known, into an
error class, and quantify the
uncertainty in its predicted
function response.
Is readily applicable to a majority
of interpolative surrogate models.
The term "prediction uncertainty" denotes the
distribution of errors of the surrogate.
11. Cross-Validation
Two popular strategies: (i) leave-one-out; and (ii) q-fold.
In order to obtain the error at each training point, the leave-one-
out strategy is adopted in the DSUS framework.
The Relative Accuracy Error (RAE) is used to classify the
training points into classes.
11
Actual
function value
Estimated value
by surrogate
12. Classifying the Training Points into Error Classes
• According to the RAE values, we classify the training points into error
classes, and define the lower and upper bounds of each class.
12
Class Design variable RAE
1 0.573 0.018
2 0.277 0.691
1 0.044 0.045
1 0.371 0.018
2 0.910 0.345
2 0.767 0.116
1 0.720 0.043
1 0.865 0.060
1 0.508 0.073
2 0.637 0.316
2 0.977 1.078
1 0.240 0.013
1 0.184 0.019
1 0.107 0.004
1 0.453 0.049
Class 1
RAE < 10%
Class 2
RAE > 10%
13. Pattern Classification
A wide variety of pattern classification methods are available:
Linear discriminant analysis (LDA);
Principal components analysis (PCA);
Kernel estimation and K-nearest-neighbor algorithms;
Perceptrons;
Neural Network; and
Support Vector Machine (SVM) 1,2.
a competitive approach for multiclass
13
classification problem
1. Basudhar and Missoum; 2. Sakalkar and Hajela
14. Support Vector Machine (SVM)
Four kernels are popularly used:
Linear:
Polynomial:
Radial basis function:
Sigmoid:
14
We have used an efficient SVM package, LIBSVM (A
Library for Support Vector Machines), developed by Chang
and Lin.
One-against-one
classification
16. Adaptive Hybrid Functions (AHF)
Determination of a trust region:
numerical bounds of the estimated
parameter (output) as functions of
the independent parameters (input).
Definition of a local measure of
accuracy (using kernel functions) of
the estimated function value, and
representation of the corresponding
distribution parameters as functions
of the input vector.
Weighted summation of different
surrogate models based on the local
measure of accuracy.
16
17. Illustrating Cross-validation Errors as Local Error Measures
17
The local errors of the surrogate are evaluated in the neighborhood of each
training point.
A local hypercube is constructed to include one training point.
The length of the hypercube along each dimension is determined by
The jth hypercube can be expressed by
18. Illustrating Cross-validation Errors as Local Error Measures
18
The RAE at each test point within the jth local hypercube is given by
For the jth local hypercube, the average of the RAE values for points is given by
The local errors estimated by the leave-one-out surrogate at each
training point are compared with the RAEte
value estimated within the
local hypercube.
19. Analytical Examples
19
Three analytical examples are tested.
The 1-variable function;
The 2-variable Dixon & Price function; and
The 2-variable Booth function.
The leave-one-out cross-validation errors and the actual local errors
in the local hypercube are specified as
Percentage Error w.r.t the mean error in the entire
domain
Class 1 <50%
Class 2 50-100 %
Class 3 100-150%
Class 4 >150%
20. Analytical Examples
20
Three analytical examples are tested.
The 1-variable function;
The 2-variable Dixon & Price function; and
The 2-variable Booth function.
Leave-one-out
Cross-validation
Local hypercube
21. Results of Local Error Measures
21
1-variable function
The cross-validation errors and the actual local errors belong to the
same class for 99.33% of the 15 training points.
22. Results of Local Error Measures
22
Dixon & Price function
Actual local errors (local hypercube) Cross-validation errors
The cross-validation errors and the actual local errors belong to the
same class for 83.33% of the 30 training points.
23. Results of Local Error Measures
23
Booth function
Actual local errors (local hypercube) Cross-validation errors
The cross-validation errors and the actual local errors belong to the
same class for 86.67% of the 30 training points.
24. Wind Energy Case Studies
We apply the DSUS framework to key aspects of wind resource assessment
and wind farm cost modeling.
Onshore wind farm cost model; and
Wind Power Potential (WPP) model.
24
Response Surface-Based Wind Farm Cost (RS-WFC) model
The inputs for the surrogate model are
The number, and
The rated power of wind turbines.
The output of the surrogate model is
Total annual cost of a wind farm
Surrogate:
25. Wind Power Potential (WPP)
The WPP method predicts the quality of wind resources by considering
the joint distribution of wind speed and wind direction, which can help
decision makers in wind farm siting. The key steps include:
1. Determining distribution type and parameters; and The 5-parameter
bivariate normal distribution is adopted.
2. Sampling the five distribution parameters;
3. Maximizing the net power generation through farm layout optimization
for each sample distribution; and
4. A surrogate model is constructed to represent the computed maximum
capacity factor as a function of the parameters of the bivariate normal
distribution.
The uncertainty in the WPP is characterized for two cases:
Evaluating the WPP for a four-turbine farm; and
Evaluating the WPP for a nine-turbine farm
25
26. SVM Kernels and Predefined Classes Bounds
26
Numerical setup for test problems
The uncertainty scale in each class
27. Representation of Prediction Uncertainty
• Gaussian distribution is adopted to represent the uncertainty in the
prediction accuracy of the surrogate.
• For any new point (design) candidate, the DSUS framework can classify
that point into one of these error classes.
27
Wind farm cost model
28. Uncertainty Prediction Results
28
Classification accuracy of each problem
Problem Parameters Accuracy
Wind farm cost C=1, γ=0.8 100% (20/20)
WPP (4 turbines) C=1, γ=1 90% (18/20)
WPP (9 turbines) C=1, γ=0.2 95% (19/20)
• The classification accuracy of the DSUS prediction is more than 90% for
all problems.
29. Uncertainty Prediction Results
29
Uncertainty characterization for new designs (wind farm cost)
New design No. of turbines Rated
Power
Class Uncertainty (μ, σ)
1 40 1.25 MW 1 0.0018, 0.0011
2 7 1 MW 2 0.0066, 0.0013
3 44 1.5 MW 3 0.0216, 0.0102
The total annual cost per kilowatt installed (for the thisd wind farm) is
estimated as 122.28 $/kW.
Assuming a lifetime of 20 years, the 2.16% (mean value) error in the cost
estimation is approximately 3.5 million dollars, which is an appreciable
value for such a medium scale wind farm.
30. Uncertainty in The Wind Power Potential
30
Uncertainty in the estimated capacity factors
For the Ada station, the capacity factor is estimated as 48.52% for the 9-
turbine farm.
The 53.74% (mean value) error in the capacity factor estimation results in
approximately 3×107 kWh annual energy production, which is significant
for such a small scale wind farm.
31. Uncertainty in The Wind Power Potential
31
WPP with 4 turbines WPP with 9 turbines
32. Concluding Remarks
• The Domain Segmentation based on Uncertainty in the Surrogate
(DSUS) framework could successfully characterize the uncertainty
attributable to surrogate models.
• The mean errors in the wind farm cost and wind power potential
(in the case studies) are significant for small/medium scale wind
farms, which should be carefully considered during the decision
making process.
• The results show that the leave-one-out cross-validation error can
capture the local errors of a surrogate with a reasonable accuracy.
• Future research should investigate other error metrics that better
represent the performance over the entire design domain
32
33. Acknowledgement
• I would like to acknowledge my research adviser
Prof. Achille Messac, for his immense help and
support in this research.
• I would also like to thank my colleagues Souma
Chowdhury and Ali Mehmani for their valuable
contributions to this paper.
• Support from the NSF Awards is also
acknowledged.
33
34. Selected References
1. Keane, A. J. and Nair, P. B., Computational Approaches for Aerospace Design: The Pursuit of Excellence, John Wiley and Sons,
2005.
2. Basudhar, A. and Missoum, S., “Adaptive Explicit Decision Functions for Probabilistic Design and Optimization Using Support
Vector Machines,” Computers and Structures, Vol. 86, No. 19-20, 2008, pp. 1904–1917.
3. Forrester, A. and Keane, A., “Recent Advances in Surrogate-based Optimization,” Progress in Aerospace Sciences, Vol. 45, No.
1-3, 2009, pp. 50–79.
4. Wang, G. and Shan, S., “Review of Metamodeling Techniques in Support of Engineering Design Optimization,” Journal of
Mechanical Design, Vol. 129, No. 4, 2007, pp. 370–380.
5. Simpson, T., Toropov, V., Balabanov, V., , and Viana, F., “Design and Analysis of Computer Experiments in Multidisciplinary
Design Optimization: A Review of How Far We Have Come or Not,” 12th AIAA/ISSMO Multidisciplinary Analysis and
Optimization Conference, Victoria, Canada, September 10-12 2008.
6. Zhang, J., Chowdhury, S., and Messac, A., “An Adaptive Hybrid Surrogate Model,” Structural and Multidisciplinary
Optimization, 2012, doi: 10.1007/s00158-012-0764-x.
7. Forrester, A., Sobester, A., and Keane, A., Engineering Design via Surrogate Modelling: A Practical Guide, Wiley, 2008.
8. Apley, D. W., Liu, J., and Chen, W., “Understanding the Effects of Model Uncertainty in Robust Design With Computer
Experiments,” ASME Journal of Mechanical Design, Vol. 128, No. 4, 2006, pp. 945(14 pages).
9. Neufeld, D. and an J. Chung, K. B., “Aircraft Wing Box Optimization Considering Uncertainty in Surrogate Models,” Structural
and Multidisciplinary Optimization, Vol. 42, No. 5, 2010, pp. 745–753.
10. Viana, F. A. C. and Haftka, R. T., “Importing Uncertainty Estimates from One Surrogate to Another,” 50th
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Palm Springs, California, May 4-6
2009.
34
Notes de l'éditeur
In the literature, there are different surrogate models, such as Kriging, RBF and E-RBF. Can we adaptively combine the advantages of different surrogate models into one single hybrid surrogate, which provides more accurate estimation?
Can we segregate the input space of a surrogate based on the accuracy of the surrogate in different regions?
The power that can be generated by a wind farm is much less than the available wind resource at the farm site. How to assess the maximum wind power generation of a wind farm at a farm site, based on the recorded wind data, and planned wind farm capacity?