Problem Understanding through Landscape Theory

1 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Problem Understanding through
Landscape Theory
Francisco Chicano, Gabriel Luque and Enrique Alba

Conclusions
& Future Work
• A landscape is a triple (X,N, f) where
Ø X is the solution space
Ø N is the neighbourhood operator
Ø f is the objective function
Landscape Definition
Landscape Definition Elementary Landscapes Landscape decomposition
The pair (X,N) is called
configuration space
s0
s4
s7
s6
s2
s1
s8
s9
s5
s32
0
3
5
1
2
4
0
7
6
• The neighbourhood operator is a function
N: X →P(X)
• Solution y is neighbour of x if y ∈ N(x)
• Regular and symmetric neighbourhoods
• d=|N(x)| ∀ x ∈ X
• y ∈ N(x) ⇔ x ∈ N(y)
• Objective function
f: X →R (or N, Z, Q)

Conclusions
& Future Work
• An elementary function is an eigenvector of the graph Laplacian (plus constant)
• Graph Laplacian:
• Elementary function: eigenvector of Δ (plus constant)
Elementary Landscapes: Formal Definition
s0
s4
s7
s6
s2
s1
s8
s9
s5
s3
Adjacency matrix Degree matrix
Depends on the
configuration space
Eigenvalue

Conclusions
& Future Work
• An elementary landscape is a landscape for which
where
• Grover’s wave equation
Elementary Landscapes: Characterization
Linear relationship
Eigenvalue
Depend on the
problem/instance
def

Conclusions
& Future Work
Elementary Landscapes: Examples
Problem Neighbourhood d k
Symmetric TSP
2-opt n(n-3)/2 n-1
swap two cities n(n-1)/2 2(n-1)
Antisymmetric TSP
inversions n(n-1)/2 n(n+1)/2
swap two cities n(n-1)/2 2n
Graph α-Coloring recolor 1 vertex (α-1)n 2α
Graph Matching swap two elements n(n-1)/2 2(n-1)
Graph Bipartitioning Johnson graph n2/4 2(n-1)
NAES bit-flip n 4
Max Cut bit-flip n 4
Weight Partition bit-flip n 4

Conclusions
& Future Work
• What if the landscape is not elementary?
• Any landscape can be written as the sum of elementary landscapes
• There exists a set of elementary functions that form a basis of the
function space (Fourier basis)
Landscape Decomposition
X X X
e1
e2
Elementary functions
(from the Fourier basis)
Non-elementary function
f Elementary
components of f
f < e1,f > < e2,f >
< e2,f >
< e1,f >

Conclusions
& Future Work
Landscape Decomposition: Examples
Problem Neighbourhood d Components
General TSP
inversions n(n-1)/2 2
swap two cities n(n-1)/2 2
Subset Sum Problem bit-flip n 2
MAX k-SAT bit-flip n k
NK-landscapes bit-flip n k+1
Radio Network Design bit-flip n
max. nb. of
reachable
antennae
Frequency Assignment change 1 frequency (α-1)n 2
QAP swap two elements n(n-1)/2 3

Conclusions
& Future Work
• Let {x0, x1, ...} be a simple random walk on the configuration space where xi+1∈N(xi)
• The random walk induces a time series {f(x0), f(x1), ...} on a landscape.
• The autocorrelation function is defined as:
• The autocorrelation length and coefficient:
• Autocorrelation length conjecture:
Autocorrelation
The number of local optima in a search space is roughly Solutions
reached from x0
after l moves
s0
s4
s7
s6
s2
s1
s8
s9
s5
s32
0
3
5
1
2
4
0
7
6
Autocorrelation FDC Mutation Uniform Crossover Runtime

Conclusions
& Future Work
• The higher the value of l and ξ the smaller the number of local optima
• l and ξ is a measure of ruggedness
Autocorrelation Length “Conjecture”
Ruggedness
SA (configuration 1) SA (configuration 2)
% rel. error nb. of steps % rel. error nb. of steps
10 ≤ ζ < 20 0.2 50,500 0.1 101,395
20 ≤ ζ < 30 0.3 53,300 0.2 106,890
30 ≤ ζ < 40 0.3 58,700 0.2 118,760
40 ≤ ζ < 50 0.5 62,700 0.3 126,395
50 ≤ ζ < 60 0.7 66,100 0.4 133,055
60 ≤ ζ < 70 1.0 75,300 0.6 151,870
70 ≤ ζ < 80 1.3 76,800 1.0 155,230
80 ≤ ζ < 90 1.9 79,700 1.4 159,840
90 ≤ ζ < 100 2.0 82,400 1.8 165,610
Length
Coefficient
Angel, Zissimopoulos. Theoretical
Computer Science 263:159-172 (2001)

Conclusions
& Future Work
Fitness-Distance Correlation: Definition
Distance to the optimum
FitnessvalueDefinition 2. Given a function f : Bn
7! R the
f is defined as
r =
Covfd
f d
,
where Covfd is the covariance of the fitness v
solutions to their nearest global optimum, f i
fitness values in the search space and d is the sta
to the nearest global optimum in the search spac
Covfd =
1
2n
X
(f(x)
Difficult when r < 0.15
(Jones & Forrest)

Conclusions
& Future Work
• Using the previous facts we get for the elementary landscapes…
• In general, for an arbitrary function…
Fitness-Distance Correlation Formulas
j =
2
p] = 0 for j > 0
f[0] = f
r =
f[p](x⇤
)
f
p
n
r = 0 If j>1
f(x) =
p=0
f[p](x)
j =
1
2
f[p] = 0 for j > 0
f[0] = f
r =
f[p](x⇤
)
f
p
n
r = 0
f(x) = f[0](x) + f[1](x) + f[2](x) + . . . + f[n](x)
… the only component contributing to r is f[1](x)
d =
r
n(n + 1)
4
n2
4
=
r
n
4
=
p
n
2
.
e ready to prove the main result of this work.
Let f be an objective function whose elementary landsca
=
Pn
p=0 f[p], where f[0] is the constant function f[0](x)
p > 0 is an order-p elementary function with zero o↵se
e global optimum in the search space x⇤
, the FDC can
r =
f[1](x⇤
)
f
p
n
.
Rugged components are
not considered by FDC
EvoCOP 2012, LNCS
7245: 111-123
If j=1
j =
1
2
f[j] = 0 for j > 0
f[0] = f
r =
f[j](x⇤
)
f
p
n
r = 0
x) + f[1](x) + f[2](x) + . . . + f[n](x)
a1 = 7061.43
a2 = a3 = . . . = a45 = 1
8 9

Conclusions
& Future Work
• Fitness-Distance Correlation for order-1 (linear) elementary landscapes (assume max.)
• We can define a linear elementary landscape with the desired FDC ρ (lower than 0)
• “Difficult” problems can be obtained starting in n=45 (|r| < 0.15)
FDC: Implications for Linear Functions
FDC is negative, while if we are minimizing f(x⇤
) < f
er-1 elementary landscapes can always be written as
can be optimized in polynomial time. That is, if f is
nction then it can be written in the following way:
f(x) =
nX
i=1
aixi + b. (17)
values. The following proposition provides the average
on for this family of functions.
e an order-1 elementary function, which can be written
ge and the standard deviation of the function values in
e:
1
nX 1
v
u
u nX
2
Using Proposition 2 we can compute the FDC for the
landscapes.
Proposition 3. Let f be an order-1 elementary function w
that all ai 6= 0. Then, it has one only global optimum and
maximization) is:
r =
Pn
i=1 |ai|
p
n
Pn
i=1 a2
i
,
which is always in the interval 1  r < 0.
Using Proposition 2 we can compute the FDC
ndscapes.
roposition 3. Let f be an order-1 elementary fun
at all ai 6= 0. Then, it has one only global optimu
aximization) is:
r =
Pn
i=1 |ai|
p
n
Pn
i=1 a2
i
,
hich is always in the interval 1  r < 0.
f f(x⇤
) = b +
1
2
nX
i=1
ai
!
0
B
@b +
nX
i=1
ai>0
ai
1
C
A =
1
2
X
i
Replacing the previous expression and f in (16) we prove the
When all the values of ai are the same, the FDC computed
This happens in particular for the Onemax problem. But if the
values for ai, then we can reach any arbitrary value in [ 1, 0) fo
theorem provides a way to do it.
Theorem 2. Let ⇢ be an arbitrary real value in the interval
linear function f(x) given by (17) where n > 1/⇢2
, a2 = a3 =
a1 is
a1 =
(n 1) + n|⇢|
p
(1 ⇢2)(n 1)
n⇢2 1
has exactly FDC r = ⇢.
Proof. The expression for a1 is well-deﬁned since n⇢2
> 1. Re
in (20) we get r = ⇢.
te:
nX
=1
ai
!
0
B
@b +
nX
i=1
ai>0
ai
1
C
A =
1
2
nX
i=1
|ai|. (22)
on and f in (16) we prove the claimed result.
ut
e the same, the FDC computed with (20) is 1.
the Onemax problem. But if there exist di↵erent
any arbitrary value in [ 1, 0) for r. The following
t.
rary real value in the interval [ 1, 0), then any
17) where n > 1/⇢2
, a2 = a3 = . . . = an = 1 and
1) + n|⇢|
p
(1 ⇢2)(n 1)
n⇢2 1
(23)
i=1
ai>0
on 2 we can write:
(x⇤
) = b +
1
2
nX
i=1
ai
!
0
B
@b +
nX
i=1
ai>0
ai
1
C
A =
1
2
nX
i=1
|ai|. (22)
previous expression and f in (16) we prove the claimed result.
ut
e values of ai are the same, the FDC computed with (20) is 1.
n particular for the Onemax problem. But if there exist di↵erent
en we can reach any arbitrary value in [ 1, 0) for r. The following
es a way to do it.
Let ⇢ be an arbitrary real value in the interval [ 1, 0), then any
f(x) given by (17) where n > 1/⇢2
, a2 = a3 = . . . = an = 1 and
a1 =
(n 1) + n|⇢|
p
(1 ⇢2)(n 1)
n⇢2 1
(23)
f(x) =
nX
p=0
f[p](x)
where
j =
1
2
f[p] = 0 for j > 0
f[0] = f
r =
f[p](x⇤
)
f
p
n
r = 0
f(x) = f[0](x) + f[1](x) + f[2](x) + . . . + f[n](
a1 = 7061.43
a2 = a3 = . . . = a45 = 1
f(x) =
nX
p=0
f[p](x)
where
j =
1
2
f[p] = 0 for j > 0
f[0] = f
r =
f[p](x⇤
)
f
p
n
r = 0
f(x) = f[0](x) + f[1](x) + f[2](x) + . .
a1 = 7061.43
a2 = a3 = . . . = a45 = 1

Conclusions
& Future Work
• Example (j=0,1,2,3):
p=1/2 → start from scratch
p≈0.23 → maximum expectation
The traditional p=1/n could be around here
E[f]x =
nX
j=0
(1 2p)j
f[j](x) (1)
R(A, f) = (f)
|{z}
problema
⌦ ⇤(A)
| {z }
algoritmo
(2)
f = f[0] + f[1] + f[2] + f[3] (3)
` =
1X
s=0
r(s) (4)
M ⇡ |X|/|X(x0, `)| (5)
Expectation for Bit-flip Mutation
where f[j] is the elementary component of f with order
Proof. We can write f as the sum of its elementary
we can compute the expected value as:
E{f(Mp(x))} =
nX
j=0
E{f[j](Mp(x))}
where we used the result of Theorem 1.
3.2 Higher Order Moments
Equation (35) can be used to compute the expected
it to extend to higher order moments, as in the foll
Theorem 2. Let x 2 Bn
be a binary string, f : Bn
!
reached after applying the bit-ﬂip mutation operator wi
moment of the random variable f(Mp(x)) is
µ {f(M (x))} =
nX
(1
here f[j] is the elementary component of f with order j.
oof. We can write f as the sum of its elementary components as f =
Pn
j
e can compute the expected value as:
E{f(Mp(x))} =
nX
j=0
E{f[j](Mp(x))} =
nX
j=0
(1 2p)j
f[j](x),
here we used the result of Theorem 1.
2 Higher Order Moments
quation (35) can be used to compute the expected value of f(Mp(x)). We
to extend to higher order moments, as in the following theorem.
heorem 2. Let x 2 Bn
be a binary string, f : Bn
! R a function and Mp(x
ached after applying the bit-ﬂip mutation operator with probability p to solutio
oment of the random variable f(Mp(x)) is
µm{f(Mp(x))} =
nX
j=0
(1 2p)j
fm
[j](x),

Conclusions
& Future Work
Expectation for the Uniform Crossover
zi}
6)
Now we are ready to present the main result of this work.
Theorem 1. Let f be a pseudo-Boolean function defined
over Bn
and aw with w 2 Bn
its Walsh coe cients. The
following identity holds for E{f(U⇢(x, y))}:
E{f(U⇢(x, y))} =
nX
r=0
A(r)
x,y(1 2⇢)r
, (32)
where the coe cients A
(r)
x,y are defined in the following way:
A(r)
x,y =
X
w2Bn
|(x y)^w|=r
aw w(y). (33)
Proof. According to (22) and (25) we can write
E{f(U⇢(x, y))} = 2n
X
w2Bn
awbw,⇢(x, y)
=
X
w2Bn
aw w(y)(1 2⇢)|(x y)^w|
Theorem 1. Let f be a pseudo-Boolean function defined
over Bn
and aw with w 2 Bn
its Walsh coe cients. Th
following identity holds for E{f(U⇢(x, y))}:
E{f(U⇢(x, y))} =
nX
r=0
A(r)
x,y(1 2⇢)r
, (32
where the coe cients A
(r)
x,y are defined in the following way
A(r)
x,y =
X
w2Bn
|(x y)^w|=r
aw w(y). (33
Proof. According to (22) and (25) we can write
E{f(U⇢(x, y))} = 2n
X
w2Bn
awbw,⇢(x, y)
E{f(C(x, y))} =
X
z2Bn
f(z)Pr{C(x, y) = z}
=
X
z2Bn
X
w2Bn
aw w(z)
!
Pr{C(x, y) = z}
=
X
w2Bn
aw
X
z2Bn
w(z)Pr{C(x, y) = z}
!
=
X
w2Bn
awbw(x, y)
bw,⇢(x, y) = w(y)(1 2⇢)|(x y)^w|
(1
E{f(U1/2(x, y))} = A(0)
x,y =
X
w2Bn
|(x y)^w|=0
aw w(y). (2
¯f

Conclusions
& Future Work
• Can we compute the probability distribution?
Only Expected Values?
fitness
probability
Current solution
expectation
expectation
As a consequence of the previous proposition we cannot ensu
evaluation of the matrix function F (x) exists in general. The compl
F (x) depends on the problem.
3.4 Fitness Probability Distribution
With the help of the moments vector µ{f(Mp(x))} we can comp
distribution of the values of f in a mutated solution. In order to do
the same way as Sutton et al. (2011a).
Let us call ⇠0 < ⇠1 < · · · < ⇠q 1 to the q possible values that the
in the search space. Since we are dealing with a finite search space,
(perhaps very large). We are interested in computing Pr{f(Mp(x))
In order to simplify the notation in the following we define the vec
⇡(f(Mp(x))) as ⇡i(f(Mp(x))) = Pr{f(Mp(x)) = ⇠i}.
Theorem 3. Let us consider the binary hypercube and let us denote with
that the objective function f can take in the search space, where ⇠i < ⇠i+
Then, the vector of probabilities ⇡(f(Mp(x))) can be computed as:
⇡(f(Mp(x))) = V T 1
F (x)
| {z }
problem-dependent
⇤(p),
where the matrix function F (x) is limited to the first q rows and V deno
j
Polynomials in p

Conclusions
& Future Work
• The expected hitting time is a fraction of polynomials in p
• Optimal probability of mutation for n=2
articular instances. The disadvantage is that the expression is quite complex to
yze and we need to use numerical methods, so it is not easy to generalize the answ
btained.
Let us first start by studying the (1 + 1) EA. Taking into account the $ ma
efined in (51) for Onemax, the expected number of iterations can be exactly compu
s a function of p, the probability of flipping a bit. Just for illustration purposes
resent the expressions of such expectation for n  3:
E{⌧} =
1
2p
for n = 1,
E{⌧} =
7 5p
4(p 2)(p 1)p
for n = 2,
E{⌧} =
26p4
115p3
+ 202p2
163p + 56
8(p 1)2p (p2 3p + 3) (2p2 3p + 2)
for n = 3.
volutionary Computation Volume x, Number x
n=1
n=2
n=3
n=4
n=
p
E{t}
0.0 0.2 0.4 0.6 0.8 1
0
5
10
gure 1: Expected runtime of the (1+1) EA for Onemax as a function of the probabi
flipping a bit. Each line correspond to a different value of n from 1 to 7.
Having the exact expressions we can compute the optimal mutation probabi
r each n by using classical optimization methods in one variable. In particular,
= 1 the optimal value is p = 1 as we previously saw and for n = 2 we have to solv
ubic polynomial in order to obtain the exact expression. The result is:
p⇤
2 =
1
5
0
@6 3
s
2
23 5
p
21
3
s
23 5
p
21
2
1
A ⇡ 0.561215, (
hich is slightly higher than the recommended value p = 1/n. As we increase
nalytical responses for the optimal probability are not possible and we have to ap
Runtime of (1+1) EA: Expected Hitting Time

Conclusions
& Future Work
Runtime of (1+1) EA: Curves
alternate between two solutions if p = 1. However, when n = 1 the probability p = 1
is valid, furthermore, is optimal, because if the global solution is not present at the
beginning we can reach it by alternating the only bit we have. In Figure 1 we show
the expected runtime as a function of the probability of ﬂipping a bit for n = 1 to 7.
We can observe how the optimal probability (the one obtaining the minimum expected
runtime) decreases as n increases.
n=1
n=2
n=3
n=4
n=5
n=6
n=7
p
E{t}
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
Figure 1: Expected runtime of the (1+1) EA for Onemax as a function of the probability

Conclusions
& Future Work
Runtime of (1+1) EA: Optimal Probabilities
Fitness Probability Distribution of Bit-Flip Mutation
n p⇤
n E{⌧} n p⇤
n E{⌧} n p⇤
n E{⌧}
1 1.00000 0.500 35 0.03453 273.018 68 0.01741 648.972
2 0.56122 2.959 36 0.03354 283.448 69 0.01715 661.189
3 0.38585 6.488 37 0.03261 293.953 70 0.01690 673.445
4 0.29700 10.808 38 0.03172 304.531 71 0.01665 685.740
5 0.24147 15.758 39 0.03088 315.181 72 0.01642 698.073
6 0.20323 21.222 40 0.03009 325.900 73 0.01618 710.444
7 0.17526 27.120 41 0.02933 336.688 74 0.01596 722.852
8 0.15391 33.391 42 0.02861 347.541 75 0.01574 735.298
9 0.13710 39.990 43 0.02793 358.459 76 0.01553 747.779
10 0.12352 46.882 44 0.02727 369.441 77 0.01532 760.297
11 0.11233 54.039 45 0.02665 380.484 78 0.01512 772.849
12 0.10295 61.437 46 0.02605 391.587 79 0.01492 785.437
13 0.09499 69.057 47 0.02548 402.750 80 0.01473 798.059
14 0.08815 76.882 48 0.02493 413.970 81 0.01454 810.715
15 0.08220 84.898 49 0.02441 425.247 82 0.01436 823.405
16 0.07699 93.092 50 0.02391 436.580 83 0.01418 836.128
17 0.07239 101.454 51 0.02342 447.967 84 0.01400 848.884
18 0.06830 109.974 52 0.02296 459.407 85 0.01384 861.673
19 0.06463 118.642 53 0.02251 470.900 86 0.01367 874.493
20 0.06133 127.453 54 0.02208 482.444 87 0.01351 887.345
21 0.05835 136.398 55 0.02167 494.038 88 0.01335 900.229
22 0.05563 145.471 56 0.02127 505.682 89 0.01320 913.143
23 0.05316 154.667 57 0.02088 517.374 90 0.01304 926.088
24 0.05089 163.981 58 0.02051 529.114 91 0.01290 939.063
25 0.04880 173.406 59 0.02016 540.901 92 0.01275 952.069
26 0.04687 182.940 60 0.01981 552.734 93 0.01261 965.104

Conclusions
& Future Work
Landscape Explorer
RCP Architecture Procs. & Lands. URLs
• Main goals:
• Easy to extend
• Multiplatform
• RCP architecture
RCP (Rich Client Platform)

Conclusions
& Future Work
Rich Client Platform
• Mechanisms for plugin interaction:
• Extensions, extension points and package exports

Conclusions
& Future Work
Architecture
• Main plugins of the application
neo.landscapes.theory.tool.procedures
neo.landscapes.theory.tool.selectors
neo.landscapes.theory.tool neo.landscapes.theory.kernel
TSP QAP SS UQO DFA WFFAP
.landscapes
.selectors
.procedures

Conclusions
& Future Work
Included Procedures and Landscapes
• Procedures
• Elementary Landscape Check
• Mathematica program to get the Elementary Landscape Decomposition
• Computation of the reduced adjacency matrix
• Theoretical Autocorrelation Measures
• Experimental Autocorrelation
• Estimation of the number of elementary components
• Landscapes
• QAP (Quadratic Assignment Problem)
• UQO (Unconstrained Quadratic Optimization)
• TSP (Traveling Salesman Problem)
• Walsh Functions (linear combinations)
• Subset Sum Problem

Conclusions
& Future Work
Available on the Internet
http://neo.lcc.uma.es/software/landexplorer

Conclusions
& Future Work
On-line Computation of Autocorrelation for QAP
http://neo.lcc.uma.es/software/qap.php

Conclusions
& Future Work
r =
f[p](x⇤
)
f
p
n
r = 0
f(x) = f[0](x) + f[1](x) + f[2](x) + . . . + f[n](x)
a1 = 7061.43
a2 = a3 = . . . = a45 = 1
R(A, f) = (f)
|{z}
problem
⌦ ⇤(A)
| {z }
algorithm
Conclusions & Future Work
Conclusions
•  Landscape Theory is very good for providing statitical information at a
low cost
•  FDC, expected fitness value after mutation and uniform crossover
•  Runtime?
•  Software Tools have been developed to help non-experts to use the
knowledge
Future Work

Thanks for your attention !!!
Problem Understanding through Landscape Theory

Problem Understanding through Landscape Theory

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Problem Understanding through Landscape Theory

Similaire à Problem Understanding through Landscape Theory (20)

Plus de jfrchicanog

Plus de jfrchicanog (20)

Dernier

Dernier (20)

Problem Understanding through Landscape Theory