CMA-ES and RankSVM Comparison-Based Optimizers

Motivation
Background
ACM-ES

Comparison-Based Optimizers Need
Comparison-Based Surrogates

Ilya Loshchilov1,2 , Marc Schoenauer1,2 , Michèle Sebag2,1

1
TAO Project-team, INRIA Saclay - Île-de-France
2
and Laboratoire de Recherche en Informatique (UMR CNRS 8623)
Université Paris-Sud, 91128 Orsay Cedex, France

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 1/ 20

Motivation
Background
ACM-ES

Content

1 Motivation
Why Comparison-Based Surrogates ?
Previous Work

2 Background
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Support Vector Machine (SVM)

3 ACM-ES
Algorithm
Experiments


Motivation
Background
Previous Work
ACM-ES


Surrogate Model (Meta-Model) assisted optimization
Construct the approximation model M (x) of f (x).
Optimize the model M (x) in lieu of f (x) to reduce
the number of costly evaluations of the function f (x).

Example
f (x) = x2 + (x1 + x2 )2 .
1
An efﬁcient Evolutionary Algorithm (EA) with
surrogate models may be 4.3 faster on f (x).
But the same EA is only 2.4 faster on f (x) = f (x)1/4 ! a
a CMA-ES with quadratic meta-model (lmm-CMA-ES) on fSchwefel 2-D


Motivation
Background
Previous Work
ACM-ES

Ordinal Regression in Evolutionary Computation

Goal: Find the function F (x) which preserves the ordering of the
training points xi (xi has rank i):
xi xj ⇔ F (xi ) > F (xj )
F (x) is invariant to any rank-preserving transformation.
CMA-ES with Rank Support Vector Machine on Rosenbrock: 1
n =2 2
n =5 3
n =10
10 10
0
10
( I ,λ )CMA-ES
Poly, d =2
mean ﬁtness

10
1
10
2 Poly, d =4
2
10
RBF, γ =1

0 1
4 10 10
10

6 1 0
10 10 10
200 400 600 200 600 1000 1000 2000 3000
# function evaluations # function evaluations # function evaluations

1 T. Runarsson (2006). "Ordinal Regression in Evolutionary Computation"


Motivation
Background
Previous Work
ACM-ES

Exploit the local topography of the function

CMA-ES adapts the covariance matrix C which describes the
local structure of the function.
Mahalanobis (fully weighted Euclidean) distance:
d(xi , xj ) = (xi − xj )T C −1 (xi − xj )
2
Results of CMA-ES with quadratic meta-models:

8
10 lmm-CMA-ES
O(FLOP)/saved function evaluation

Speed-up: a factor of 2-4 for n ≥ 4
6
10
Complexity: from O(n4 ) to O(n6 )
4
10 Rank-preserving invariance: NO
becomes intractable for n>16
2
10

2 4 8 16
n

2 S. Kern et al. (2006). "Local Meta-Models for Optimization Using Evolution Strategies"


Motivation
Background
Previous Work
ACM-ES

Tractable or Efﬁcient ?

Answer: Tractable and Efﬁcient and Invariant.

Ingredients: CMA-ES (Adaptive Encoding) and Rank SVM.


Motivation
Background
ACM-ES

Covariance Matrix Adaptation Evolution Strategy
Decompose to understand

While CMA-ES by deﬁnition is CMA and ES, only recently the
algorithmic decomposition has been presented. 3

Algorithm 1 CMA-ES = Adaptive Encoding + ES
1: xi ← m + σNi (0, I), for i = 1 . . . λ
2: fi ← f (Bxi ), for i = 1 . . . λ
3: if Evolution Strategy (ES) then
success rate
4: σ ← σexp∝( expected success rate −1)
5: if Cumulative Step-Size Adaptation ES (CSA-ES) then
evolution path
6: σ ← σexp∝( expected evolution path −1)
7: B ←AECMA -Update(Bx1 , . . . , Bxµ )

3 N. Hansen (2008). "Adaptive Encoding: How to Render Search Coordinate System Invariant"


Motivation
Background
ACM-ES

Adaptive Encoding
Inspired by Principal Component Analysis (PCA)

Principal Component Analysis

Adaptive Encoding Update


Motivation
Background
ACM-ES

Support Vector Machine for Classiﬁcation
Linear Classiﬁer

L1 Main Idea
L2
Training Data:
w n
L3 D = {(xi , yi )|xi ∈ I p , yi ∈ {−1, +1}}i=1
R
w, xi ≥ b + ⇒ yi = +1;
w, xi ≤ b − ⇒ yi = −1;
Dividing by > 0:
b w, xi − b ≥ +1 ⇒ yi = +1;
+1 w, xi − b ≤ −1 ⇒ yi = −1;
b 0
b -1
support b Optimization Problem: Primal Form
vector w
n
Minimize{w, ξ} 1 ||w||2 + C i=1 ξi
2
xi subject to: yi ( w, xi − b) ≥ 1 − ξi , ξi ≥ 0
2/ ||w||


Motivation
Background
ACM-ES

Linear Classiﬁer

L1 Optimization Problem: Dual Form
L2
From Lagrange Theorem, instead of minimize F :
w
L3 Minimize{α,G} F − i αi Gi
subject to: αi ≥ 0, Gi ≥ 0
Leaving the details, Dual form:
n 1 n
Maximize{α} i αi − 2 i,j=1 αi αj yi yj xi , xj
b n
subject to: 0 ≤ αi ≤ C, i αi yi = 0
+1
b 0
b -1 Properties
support b
vector w Decision Function:
n
F (x) = sign( i αi yi xi , x − b)
xi
The Dual form may be solved using standard
2/ ||w|| quadratic programming solver.


Motivation
Background
ACM-ES

Non-Linear Classiﬁer

w,F(x) -b = +1 w
F
w,F(x) -b = -1 xi
support vector

2/ ||w||
a) b) c)

Non-linear classiﬁcation with the "Kernel trick"
1n n
Maximize{α} i αi − 2 i,j=1 αi αj yi yj K(xi , xj )
n
subject to: ai ≥ 0, i αi yi = 0,
where K(x, x ) =def < Φ(x), Φ(x ) > is the Kernel function
n
Decision Function: F (x) = sign( i αi yi K(xi , x) − b)


Motivation
Background
ACM-ES

Non-Linear Classiﬁer: Kernels

Polynomial: k(xi , xj ) = ( xi , xj + 1)d
2
xi −xj
Gaussian or Radial Basis Function: k(xi , xj ) = exp( 2σ 2 )
Hyperbolic tangent: k(xi , xj ) = tanh(k xi , xj + c)
Examples for Polynomial (left) and Gaussian (right) Kernels:


Motivation
Background
ACM-ES

Ranking Support Vector Machine

Find F (x) which preserves the ordering of the training points.

w

x x
r2)
L(
x
r1)
L(


Motivation
Background
ACM-ES

Ranking Support Vector Machine

Primal problem
N
Minimize{w, ξ} 1 ||w||2 +
2 i=1 C i ξi
w, Φ(xi ) − Φ(xi+1 ) ≥ 1 − ξi (i = 1 . . . N − 1)
subject to
ξi ≥ 0 (i = 1 . . . N − 1)

Dual problem
N −1 N −1
Maximize{α} i αi − i,j αij K(xi − xi+1 , xj − xj+1 ))
subject to 0 ≤ αi ≤ Ci (i = 1 . . . N − 1)

Rank Surrogate Function in the case 1 rank = 1 point
N −1
F(x) = i=1 αi (K(xi , x) − K(xi+1 , x))


Motivation
Algorithm
Background
Experiments
ACM-ES

Model Learning
Non-separable Ellipsoid problem

(xi −xj )T (xi −xj ) (xi −xj )T C −1 (xi −xj )
K(xi , xj ) = e− 2σ 2 ; KC (xi , xj ) = e− 2σ 2


Motivation
Algorithm
Background
Experiments
ACM-ES

Optimization or ﬁltering?
Don’t be too greedy

Optimization: Signiﬁcant Potential Speed-Up if the surrogate
model is global and accurate enough
Filtering: "Guaranteed" Speed-Up with the local surrogate model
Prescreen
(λ ) Retain with rank , ~
Evaluate
Retain with rank λ, ~λ
(λ′ )
0.6
Probability Density

0.5

0.4

0.3

0.2

0 100 200 300 400 500
Rank


Motivation
Algorithm
Background
Experiments
ACM-ES

ACM-ES Optimization Loop
2. The change of coordinates, defined
from the current covariance matrix
A. Select training points and the current mean value , reads [4]:
B. Build a surrogate model
1. Select best k training points.

1 1

0.5 0.5

X2
X2
0 0

-0.5 -0.5

-1 -1

3. Build a surrogate model using Rank SVM.
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
X1 X1
7. Add new λ′ training points and
update parameters of CMA-ES.
Rank-based
Surrogate
D. Select most promising children C. Generate pre-children
Model
5. Prescreen 4. Generate pre-children and rank them
(λ ) Retain with rank , ~
according to surrogate fitness function.
6. Evaluate
Retain with rank λ, ~λ
(λ′ )
0.6
Probability Density

0.5

0.4

0.3

0.2

0 100 200 300 400 500
Rank


Motivation
Algorithm
Background
Experiments
ACM-ES

Results
Speed-up

5 5 Function n λ λ′ e ACM-ES spu CMA-ES
Schwefel 10 10 3 801 36 3.3 2667 87
20 12 4 3531 179 2.0 7042 172
40 15 5 13440 281 1.7 22400 289
4 4 Schwefel1/4 10 10 3 1774 37 4.1 7220 206
20 12 4 6138 82 2.5 15600 294
40 15 5 22658 390 1.8 41534 466
Rosenbrock 10 10 3 2059 143 (0.95) 3.7 7669 691 (0.90)
Speedup

3 3 20 12 4 11793 574 (0.75) 1.8 21794 1529
40 15 5 49750 2412 (0.9) 1.6 82043 3991
N oisySphere 10 10 3 0.15 766 90 (0.95) 2.7 2058 148
20 12 4 0.11 1361 212 2.8 3777 127
2 2 40 15 5 0.08 2409 120 2.9 7023 173
Schwefel
1/4 Ackley 10 10 3 892 28 4.1 3641 154
Schwefel 20 12 4 1884 50 3.5 6641 108
Rosenbrock 40 15 5 3690 80 3.3 12084 247
1 Noisy Sphere 1 Ellipsoid 10 10 3 1628 95 3.8 6211 264
Ackley 20 12 4 8250 393 2.3 19060 501
Ellipsoid 40 15 5 33602 548 2.1 69642 644
Rastrigin 5 140 70 23293 1374 (0.3) 0.5 12310 1098 (0.75)
0 0
0 5 10 15 20 25 30 35 40
Problem Dimension


Motivation
Algorithm
Background
Experiments
ACM-ES

Results
Learning Time

Cost of model learning/testing increases
quasi-linearly with d on Sphere function:
10

Slope=1.13
Learning time (ms)

3
10

2
10

1
10
0 1 2 3
10 10 10 10
Problem Dimension


Motivation
Algorithm
Background
Experiments
ACM-ES

Summary

ACM-ES

ACM-ES is from 2 to 4 times faster on Uni-Modal Problems.
Invariant to rank-preserving transformation: Yes
The computation complexity (the cost of speed-up) is O(n)
comparing to O(n6 )
The source code is available online:
http://www.lri.fr/~ilya/publications/ACMESppsn2010.zip

Open Questions

Extention to multi-modal optimization
Adaptation of selection pressure and surrogate model complexity


Motivation
Algorithm
Background
Experiments
ACM-ES

Summary

Thank you for your attention!

Questions?

Motivation
Algorithm
Background
Experiments
ACM-ES

Parameters

SVM Learning:
√
Number of training points: Ntraining = 30 d for all problems,
except Rosenbrock and Rastrigin, where Ntraining = 70 (d)
√
Number of iterations: Niter = 50000 d
Kernel function: RBF function with σ equal to the average
distance of the training points
The cost of constraint violation: Ci = 106 (Ntraining − i)2.0

Offspring Selection Procedure:

Number of test points: Ntest = 500
Number of evaluated offsprings: λ = λ3
2 2
Offspring selection pressure parameters: σsel0 = 2σsel1 = 0.8

CMA-ES and RankSVM Comparison-Based Optimizers

Recommandé

Recommandé

Contenu connexe

Similaire à CMA-ES and RankSVM Comparison-Based Optimizers

Similaire à CMA-ES and RankSVM Comparison-Based Optimizers (18)

Dernier

Dernier (20)

CMA-ES and RankSVM Comparison-Based Optimizers