Taking inspiration from approximate ranking, this paper investigates the use of rank-based Support Vector Machine as surrogate model within CMA-ES, enforcing the invariance of the approach with respect to monotonous transformations of the fitness function. Whereas the choice of the SVM kernel is known to be a critical issue, the proposed approach uses the Covariance Matrix adapted by CMA-ES within a Gaussian kernel, ensuring the adaptation of the kernel to the currently explored region of the fitness landscape at almost no computational overhead. The empirical validation of the approach on standard benchmarks, comparatively to CMA-ES and recent surrogate-based CMA-ES, demonstrates the efficiency and scalability of the proposed approach.
3. Motivation
Why Comparison-Based Surrogates ?
Background
Previous Work
ACM-ES
Why Comparison-Based Surrogates ?
Surrogate Model (Meta-Model) assisted optimization
Construct the approximation model M (x) of f (x).
Optimize the model M (x) in lieu of f (x) to reduce
the number of costly evaluations of the function f (x).
Example
f (x) = x2 + (x1 + x2 )2 .
1
An efficient Evolutionary Algorithm (EA) with
surrogate models may be 4.3 faster on f (x).
But the same EA is only 2.4 faster on f (x) = f (x)1/4 ! a
a CMA-ES with quadratic meta-model (lmm-CMA-ES) on fSchwefel 2-D
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 3/ 20
4. Motivation
Why Comparison-Based Surrogates ?
Background
Previous Work
ACM-ES
Ordinal Regression in Evolutionary Computation
Goal: Find the function F (x) which preserves the ordering of the
training points xi (xi has rank i):
xi xj ⇔ F (xi ) > F (xj )
F (x) is invariant to any rank-preserving transformation.
CMA-ES with Rank Support Vector Machine on Rosenbrock: 1
n =2 2
n =5 3
n =10
10 10
0
10
( I ,λ )CMA-ES
Poly, d =2
mean fitness
10
1
10
2 Poly, d =4
2
10
RBF, γ =1
0 1
4 10 10
10
6 1 0
10 10 10
200 400 600 200 600 1000 1000 2000 3000
# function evaluations # function evaluations # function evaluations
1 T. Runarsson (2006). "Ordinal Regression in Evolutionary Computation"
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 4/ 20
5. Motivation
Why Comparison-Based Surrogates ?
Background
Previous Work
ACM-ES
Exploit the local topography of the function
CMA-ES adapts the covariance matrix C which describes the
local structure of the function.
Mahalanobis (fully weighted Euclidean) distance:
d(xi , xj ) = (xi − xj )T C −1 (xi − xj )
2
Results of CMA-ES with quadratic meta-models:
8
10 lmm-CMA-ES
O(FLOP)/saved function evaluation
Speed-up: a factor of 2-4 for n ≥ 4
6
10
Complexity: from O(n4 ) to O(n6 )
4
10 Rank-preserving invariance: NO
becomes intractable for n>16
2
10
2 4 8 16
n
2 S. Kern et al. (2006). "Local Meta-Models for Optimization Using Evolution Strategies"
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 5/ 20
6. Motivation
Why Comparison-Based Surrogates ?
Background
Previous Work
ACM-ES
Tractable or Efficient ?
Answer: Tractable and Efficient and Invariant.
Ingredients: CMA-ES (Adaptive Encoding) and Rank SVM.
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 6/ 20
7. Motivation
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Background
Support Vector Machine (SVM)
ACM-ES
Covariance Matrix Adaptation Evolution Strategy
Decompose to understand
While CMA-ES by definition is CMA and ES, only recently the
algorithmic decomposition has been presented. 3
Algorithm 1 CMA-ES = Adaptive Encoding + ES
1: xi ← m + σNi (0, I), for i = 1 . . . λ
2: fi ← f (Bxi ), for i = 1 . . . λ
3: if Evolution Strategy (ES) then
success rate
4: σ ← σexp∝( expected success rate −1)
5: if Cumulative Step-Size Adaptation ES (CSA-ES) then
evolution path
6: σ ← σexp∝( expected evolution path −1)
7: B ←AECMA -Update(Bx1 , . . . , Bxµ )
3 N. Hansen (2008). "Adaptive Encoding: How to Render Search Coordinate System Invariant"
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 7/ 20
8. Motivation
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Background
Support Vector Machine (SVM)
ACM-ES
Adaptive Encoding
Inspired by Principal Component Analysis (PCA)
Principal Component Analysis
Adaptive Encoding Update
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 8/ 20
9. Motivation
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Background
Support Vector Machine (SVM)
ACM-ES
Support Vector Machine for Classification
Linear Classifier
L1 Main Idea
L2
Training Data:
w n
L3 D = {(xi , yi )|xi ∈ I p , yi ∈ {−1, +1}}i=1
R
w, xi ≥ b + ⇒ yi = +1;
w, xi ≤ b − ⇒ yi = −1;
Dividing by > 0:
b w, xi − b ≥ +1 ⇒ yi = +1;
+1 w, xi − b ≤ −1 ⇒ yi = −1;
b 0
b -1
support b Optimization Problem: Primal Form
vector w
n
Minimize{w, ξ} 1 ||w||2 + C i=1 ξi
2
xi subject to: yi ( w, xi − b) ≥ 1 − ξi , ξi ≥ 0
2/ ||w||
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 9/ 20
10. Motivation
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Background
Support Vector Machine (SVM)
ACM-ES
Support Vector Machine for Classification
Linear Classifier
L1 Optimization Problem: Dual Form
L2
From Lagrange Theorem, instead of minimize F :
w
L3 Minimize{α,G} F − i αi Gi
subject to: αi ≥ 0, Gi ≥ 0
Leaving the details, Dual form:
n 1 n
Maximize{α} i αi − 2 i,j=1 αi αj yi yj xi , xj
b n
subject to: 0 ≤ αi ≤ C, i αi yi = 0
+1
b 0
b -1 Properties
support b
vector w Decision Function:
n
F (x) = sign( i αi yi xi , x − b)
xi
The Dual form may be solved using standard
2/ ||w|| quadratic programming solver.
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 10/ 20
11. Motivation
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Background
Support Vector Machine (SVM)
ACM-ES
Support Vector Machine for Classification
Non-Linear Classifier
w,F(x) -b = +1 w
F
w,F(x) -b = -1 xi
support vector
2/ ||w||
a) b) c)
Non-linear classification with the "Kernel trick"
1n n
Maximize{α} i αi − 2 i,j=1 αi αj yi yj K(xi , xj )
n
subject to: ai ≥ 0, i αi yi = 0,
where K(x, x ) =def < Φ(x), Φ(x ) > is the Kernel function
n
Decision Function: F (x) = sign( i αi yi K(xi , x) − b)
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 11/ 20
12. Motivation
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Background
Support Vector Machine (SVM)
ACM-ES
Support Vector Machine for Classification
Non-Linear Classifier: Kernels
Polynomial: k(xi , xj ) = ( xi , xj + 1)d
2
xi −xj
Gaussian or Radial Basis Function: k(xi , xj ) = exp( 2σ 2 )
Hyperbolic tangent: k(xi , xj ) = tanh(k xi , xj + c)
Examples for Polynomial (left) and Gaussian (right) Kernels:
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 12/ 20
13. Motivation
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Background
Support Vector Machine (SVM)
ACM-ES
Ranking Support Vector Machine
Find F (x) which preserves the ordering of the training points.
w
x x
r2)
L(
x
r1)
L(
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 13/ 20
14. Motivation
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Background
Support Vector Machine (SVM)
ACM-ES
Ranking Support Vector Machine
Primal problem
N
Minimize{w, ξ} 1 ||w||2 +
2 i=1 C i ξi
w, Φ(xi ) − Φ(xi+1 ) ≥ 1 − ξi (i = 1 . . . N − 1)
subject to
ξi ≥ 0 (i = 1 . . . N − 1)
Dual problem
N −1 N −1
Maximize{α} i αi − i,j αij K(xi − xi+1 , xj − xj+1 ))
subject to 0 ≤ αi ≤ Ci (i = 1 . . . N − 1)
Rank Surrogate Function in the case 1 rank = 1 point
N −1
F(x) = i=1 αi (K(xi , x) − K(xi+1 , x))
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 14/ 20
16. Motivation
Algorithm
Background
Experiments
ACM-ES
Optimization or filtering?
Don’t be too greedy
Optimization: Significant Potential Speed-Up if the surrogate
model is global and accurate enough
Filtering: "Guaranteed" Speed-Up with the local surrogate model
Prescreen
(λ ) Retain with rank , ~
Evaluate
Retain with rank λ, ~λ
(λ′ )
0.6
Probability Density
0.5
0.4
0.3
0.2
0 100 200 300 400 500
Rank
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 16/ 20
17. Motivation
Algorithm
Background
Experiments
ACM-ES
ACM-ES Optimization Loop
2. The change of coordinates, defined
from the current covariance matrix
A. Select training points and the current mean value , reads [4]:
B. Build a surrogate model
1. Select best k training points.
1 1
0.5 0.5
X2
X2
0 0
-0.5 -0.5
-1 -1
3. Build a surrogate model using Rank SVM.
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
X1 X1
7. Add new λ′ training points and
update parameters of CMA-ES.
Rank-based
Surrogate
D. Select most promising children C. Generate pre-children
Model
5. Prescreen 4. Generate pre-children and rank them
(λ ) Retain with rank , ~
according to surrogate fitness function.
6. Evaluate
Retain with rank λ, ~λ
(λ′ )
0.6
Probability Density
0.5
0.4
0.3
0.2
0 100 200 300 400 500
Rank
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 17/ 20
19. Motivation
Algorithm
Background
Experiments
ACM-ES
Results
Learning Time
Cost of model learning/testing increases
quasi-linearly with d on Sphere function:
10
Slope=1.13
Learning time (ms)
3
10
2
10
1
10
0 1 2 3
10 10 10 10
Problem Dimension
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 19/ 20
20. Motivation
Algorithm
Background
Experiments
ACM-ES
Summary
ACM-ES
ACM-ES is from 2 to 4 times faster on Uni-Modal Problems.
Invariant to rank-preserving transformation: Yes
The computation complexity (the cost of speed-up) is O(n)
comparing to O(n6 )
The source code is available online:
http://www.lri.fr/~ilya/publications/ACMESppsn2010.zip
Open Questions
Extention to multi-modal optimization
Adaptation of selection pressure and surrogate model complexity
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 20/ 20
21. Motivation
Algorithm
Background
Experiments
ACM-ES
Summary
Thank you for your attention!
Questions?
22. Motivation
Algorithm
Background
Experiments
ACM-ES
Parameters
SVM Learning:
√
Number of training points: Ntraining = 30 d for all problems,
except Rosenbrock and Rastrigin, where Ntraining = 70 (d)
√
Number of iterations: Niter = 50000 d
Kernel function: RBF function with σ equal to the average
distance of the training points
The cost of constraint violation: Ci = 106 (Ntraining − i)2.0
Offspring Selection Procedure:
Number of test points: Ntest = 500
Number of evaluated offsprings: λ = λ3
2 2
Offspring selection pressure parameters: σsel0 = 2σsel1 = 0.8