TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Problem Understanding through Landscape Theory
1. 1 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Problem Understanding through
Landscape Theory
Francisco Chicano, Gabriel Luque and Enrique Alba
2. 2 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
• A landscape is a triple (X,N, f) where
Ø X is the solution space
Ø N is the neighbourhood operator
Ø f is the objective function
Landscape Definition
Landscape Definition Elementary Landscapes Landscape decomposition
The pair (X,N) is called
configuration space
s0
s4
s7
s6
s2
s1
s8
s9
s5
s32
0
3
5
1
2
4
0
7
6
• The neighbourhood operator is a function
N: X →P(X)
• Solution y is neighbour of x if y ∈ N(x)
• Regular and symmetric neighbourhoods
• d=|N(x)| ∀ x ∈ X
• y ∈ N(x) ⇔ x ∈ N(y)
• Objective function
f: X →R (or N, Z, Q)
3. 3 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
• An elementary function is an eigenvector of the graph Laplacian (plus constant)
• Graph Laplacian:
• Elementary function: eigenvector of Δ (plus constant)
Elementary Landscapes: Formal Definition
s0
s4
s7
s6
s2
s1
s8
s9
s5
s3
Adjacency matrix Degree matrix
Depends on the
configuration space
Eigenvalue
Landscape Definition Elementary Landscapes Landscape decomposition
4. 4 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
• An elementary landscape is a landscape for which
where
• Grover’s wave equation
Elementary Landscapes: Characterization
Linear relationship
Eigenvalue
Depend on the
problem/instance
Landscape Definition Elementary Landscapes Landscape decomposition
def
5. 5 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Elementary Landscapes: Examples
Problem Neighbourhood d k
Symmetric TSP
2-opt n(n-3)/2 n-1
swap two cities n(n-1)/2 2(n-1)
Antisymmetric TSP
inversions n(n-1)/2 n(n+1)/2
swap two cities n(n-1)/2 2n
Graph α-Coloring recolor 1 vertex (α-1)n 2α
Graph Matching swap two elements n(n-1)/2 2(n-1)
Graph Bipartitioning Johnson graph n2/4 2(n-1)
NAES bit-flip n 4
Max Cut bit-flip n 4
Weight Partition bit-flip n 4
Landscape Definition Elementary Landscapes Landscape decomposition
6. 6 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
• What if the landscape is not elementary?
• Any landscape can be written as the sum of elementary landscapes
• There exists a set of elementary functions that form a basis of the
function space (Fourier basis)
Landscape Decomposition
Landscape Definition Elementary Landscapes Landscape decomposition
X X X
e1
e2
Elementary functions
(from the Fourier basis)
Non-elementary function
f Elementary
components of f
f < e1,f > < e2,f >
< e2,f >
< e1,f >
7. 7 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Landscape Decomposition: Examples
Problem Neighbourhood d Components
General TSP
inversions n(n-1)/2 2
swap two cities n(n-1)/2 2
Subset Sum Problem bit-flip n 2
MAX k-SAT bit-flip n k
NK-landscapes bit-flip n k+1
Radio Network Design bit-flip n
max. nb. of
reachable
antennae
Frequency Assignment change 1 frequency (α-1)n 2
QAP swap two elements n(n-1)/2 3
Landscape Definition Elementary Landscapes Landscape decomposition
8. 8 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
• Let {x0, x1, ...} be a simple random walk on the configuration space where xi+1∈N(xi)
• The random walk induces a time series {f(x0), f(x1), ...} on a landscape.
• The autocorrelation function is defined as:
• The autocorrelation length and coefficient:
• Autocorrelation length conjecture:
Autocorrelation
The number of local optima in a search space is roughly Solutions
reached from x0
after l moves
s0
s4
s7
s6
s2
s1
s8
s9
s5
s32
0
3
5
1
2
4
0
7
6
Autocorrelation FDC Mutation Uniform Crossover Runtime
9. 9 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
• The higher the value of l and ξ the smaller the number of local optima
• l and ξ is a measure of ruggedness
Autocorrelation Length “Conjecture”
Ruggedness
SA (configuration 1) SA (configuration 2)
% rel. error nb. of steps % rel. error nb. of steps
10 ≤ ζ < 20 0.2 50,500 0.1 101,395
20 ≤ ζ < 30 0.3 53,300 0.2 106,890
30 ≤ ζ < 40 0.3 58,700 0.2 118,760
40 ≤ ζ < 50 0.5 62,700 0.3 126,395
50 ≤ ζ < 60 0.7 66,100 0.4 133,055
60 ≤ ζ < 70 1.0 75,300 0.6 151,870
70 ≤ ζ < 80 1.3 76,800 1.0 155,230
80 ≤ ζ < 90 1.9 79,700 1.4 159,840
90 ≤ ζ < 100 2.0 82,400 1.8 165,610
Length
Coefficient
Angel, Zissimopoulos. Theoretical
Computer Science 263:159-172 (2001)
Autocorrelation FDC Mutation Uniform Crossover Runtime
10. 10 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Fitness-Distance Correlation: Definition
Distance to the optimum
FitnessvalueDefinition 2. Given a function f : Bn
7! R the
f is defined as
r =
Covfd
f d
,
where Covfd is the covariance of the fitness v
solutions to their nearest global optimum, f i
fitness values in the search space and d is the sta
to the nearest global optimum in the search spac
Covfd =
1
2n
X
(f(x)
Difficult when r < 0.15
(Jones & Forrest)
Autocorrelation FDC Mutation Uniform Crossover Runtime
11. 11 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
• Using the previous facts we get for the elementary landscapes…
• In general, for an arbitrary function…
Fitness-Distance Correlation Formulas
j =
2
p] = 0 for j > 0
f[0] = f
r =
f[p](x⇤
)
f
p
n
r = 0 If j>1
f(x) =
p=0
f[p](x)
j =
1
2
f[p] = 0 for j > 0
f[0] = f
r =
f[p](x⇤
)
f
p
n
r = 0
f(x) = f[0](x) + f[1](x) + f[2](x) + . . . + f[n](x)
… the only component contributing to r is f[1](x)
d =
r
n(n + 1)
4
n2
4
=
r
n
4
=
p
n
2
.
e ready to prove the main result of this work.
Let f be an objective function whose elementary landsca
=
Pn
p=0 f[p], where f[0] is the constant function f[0](x)
p > 0 is an order-p elementary function with zero o↵se
e global optimum in the search space x⇤
, the FDC can
r =
f[1](x⇤
)
f
p
n
.
Rugged components are
not considered by FDC
EvoCOP 2012, LNCS
7245: 111-123
If j=1
j =
1
2
f[j] = 0 for j > 0
f[0] = f
r =
f[j](x⇤
)
f
p
n
r = 0
x) + f[1](x) + f[2](x) + . . . + f[n](x)
a1 = 7061.43
a2 = a3 = . . . = a45 = 1
8 9
Autocorrelation FDC Mutation Uniform Crossover Runtime
12. 12 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
• Fitness-Distance Correlation for order-1 (linear) elementary landscapes (assume max.)
• We can define a linear elementary landscape with the desired FDC ρ (lower than 0)
• “Difficult” problems can be obtained starting in n=45 (|r| < 0.15)
FDC: Implications for Linear Functions
FDC is negative, while if we are minimizing f(x⇤
) < f
er-1 elementary landscapes can always be written as
can be optimized in polynomial time. That is, if f is
nction then it can be written in the following way:
f(x) =
nX
i=1
aixi + b. (17)
values. The following proposition provides the average
on for this family of functions.
e an order-1 elementary function, which can be written
ge and the standard deviation of the function values in
e:
1
nX 1
v
u
u nX
2
Using Proposition 2 we can compute the FDC for the
landscapes.
Proposition 3. Let f be an order-1 elementary function w
that all ai 6= 0. Then, it has one only global optimum and
maximization) is:
r =
Pn
i=1 |ai|
p
n
Pn
i=1 a2
i
,
which is always in the interval 1 r < 0.
Using Proposition 2 we can compute the FDC
ndscapes.
roposition 3. Let f be an order-1 elementary fun
at all ai 6= 0. Then, it has one only global optimu
aximization) is:
r =
Pn
i=1 |ai|
p
n
Pn
i=1 a2
i
,
hich is always in the interval 1 r < 0.
f f(x⇤
) = b +
1
2
nX
i=1
ai
!
0
B
@b +
nX
i=1
ai>0
ai
1
C
A =
1
2
X
i
Replacing the previous expression and f in (16) we prove the
When all the values of ai are the same, the FDC computed
This happens in particular for the Onemax problem. But if the
values for ai, then we can reach any arbitrary value in [ 1, 0) fo
theorem provides a way to do it.
Theorem 2. Let ⇢ be an arbitrary real value in the interval
linear function f(x) given by (17) where n > 1/⇢2
, a2 = a3 =
a1 is
a1 =
(n 1) + n|⇢|
p
(1 ⇢2)(n 1)
n⇢2 1
has exactly FDC r = ⇢.
Proof. The expression for a1 is well-defined since n⇢2
> 1. Re
in (20) we get r = ⇢.
te:
nX
=1
ai
!
0
B
@b +
nX
i=1
ai>0
ai
1
C
A =
1
2
nX
i=1
|ai|. (22)
on and f in (16) we prove the claimed result.
ut
e the same, the FDC computed with (20) is 1.
the Onemax problem. But if there exist di↵erent
any arbitrary value in [ 1, 0) for r. The following
t.
rary real value in the interval [ 1, 0), then any
17) where n > 1/⇢2
, a2 = a3 = . . . = an = 1 and
1) + n|⇢|
p
(1 ⇢2)(n 1)
n⇢2 1
(23)
i=1
ai>0
on 2 we can write:
(x⇤
) = b +
1
2
nX
i=1
ai
!
0
B
@b +
nX
i=1
ai>0
ai
1
C
A =
1
2
nX
i=1
|ai|. (22)
previous expression and f in (16) we prove the claimed result.
ut
e values of ai are the same, the FDC computed with (20) is 1.
n particular for the Onemax problem. But if there exist di↵erent
en we can reach any arbitrary value in [ 1, 0) for r. The following
es a way to do it.
Let ⇢ be an arbitrary real value in the interval [ 1, 0), then any
f(x) given by (17) where n > 1/⇢2
, a2 = a3 = . . . = an = 1 and
a1 =
(n 1) + n|⇢|
p
(1 ⇢2)(n 1)
n⇢2 1
(23)
f(x) =
nX
p=0
f[p](x)
where
j =
1
2
f[p] = 0 for j > 0
f[0] = f
r =
f[p](x⇤
)
f
p
n
r = 0
f(x) = f[0](x) + f[1](x) + f[2](x) + . . . + f[n](
a1 = 7061.43
a2 = a3 = . . . = a45 = 1
f(x) =
nX
p=0
f[p](x)
where
j =
1
2
f[p] = 0 for j > 0
f[0] = f
r =
f[p](x⇤
)
f
p
n
r = 0
f(x) = f[0](x) + f[1](x) + f[2](x) + . .
a1 = 7061.43
a2 = a3 = . . . = a45 = 1
Autocorrelation FDC Mutation Uniform Crossover Runtime
13. 13 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
• Example (j=0,1,2,3):
p=1/2 → start from scratch
p≈0.23 → maximum expectation
The traditional p=1/n could be around here
E[f]x =
nX
j=0
(1 2p)j
f[j](x) (1)
R(A, f) = (f)
|{z}
problema
⌦ ⇤(A)
| {z }
algoritmo
(2)
f = f[0] + f[1] + f[2] + f[3] (3)
` =
1X
s=0
r(s) (4)
M ⇡ |X|/|X(x0, `)| (5)
Expectation for Bit-flip Mutation
where f[j] is the elementary component of f with order
Proof. We can write f as the sum of its elementary
we can compute the expected value as:
E{f(Mp(x))} =
nX
j=0
E{f[j](Mp(x))}
where we used the result of Theorem 1.
3.2 Higher Order Moments
Equation (35) can be used to compute the expected
it to extend to higher order moments, as in the foll
Theorem 2. Let x 2 Bn
be a binary string, f : Bn
!
reached after applying the bit-flip mutation operator wi
moment of the random variable f(Mp(x)) is
µ {f(M (x))} =
nX
(1
here f[j] is the elementary component of f with order j.
oof. We can write f as the sum of its elementary components as f =
Pn
j
e can compute the expected value as:
E{f(Mp(x))} =
nX
j=0
E{f[j](Mp(x))} =
nX
j=0
(1 2p)j
f[j](x),
here we used the result of Theorem 1.
2 Higher Order Moments
quation (35) can be used to compute the expected value of f(Mp(x)). We
to extend to higher order moments, as in the following theorem.
heorem 2. Let x 2 Bn
be a binary string, f : Bn
! R a function and Mp(x
ached after applying the bit-flip mutation operator with probability p to solutio
oment of the random variable f(Mp(x)) is
µm{f(Mp(x))} =
nX
j=0
(1 2p)j
fm
[j](x),
Autocorrelation FDC Mutation Uniform Crossover Runtime
14. 14 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Expectation for the Uniform Crossover
zi}
6)
Now we are ready to present the main result of this work.
Theorem 1. Let f be a pseudo-Boolean function defined
over Bn
and aw with w 2 Bn
its Walsh coe cients. The
following identity holds for E{f(U⇢(x, y))}:
E{f(U⇢(x, y))} =
nX
r=0
A(r)
x,y(1 2⇢)r
, (32)
where the coe cients A
(r)
x,y are defined in the following way:
A(r)
x,y =
X
w2Bn
|(x y)^w|=r
aw w(y). (33)
Proof. According to (22) and (25) we can write
E{f(U⇢(x, y))} = 2n
X
w2Bn
awbw,⇢(x, y)
=
X
w2Bn
aw w(y)(1 2⇢)|(x y)^w|
Theorem 1. Let f be a pseudo-Boolean function defined
over Bn
and aw with w 2 Bn
its Walsh coe cients. Th
following identity holds for E{f(U⇢(x, y))}:
E{f(U⇢(x, y))} =
nX
r=0
A(r)
x,y(1 2⇢)r
, (32
where the coe cients A
(r)
x,y are defined in the following way
A(r)
x,y =
X
w2Bn
|(x y)^w|=r
aw w(y). (33
Proof. According to (22) and (25) we can write
E{f(U⇢(x, y))} = 2n
X
w2Bn
awbw,⇢(x, y)
E{f(C(x, y))} =
X
z2Bn
f(z)Pr{C(x, y) = z}
=
X
z2Bn
X
w2Bn
aw w(z)
!
Pr{C(x, y) = z}
=
X
w2Bn
aw
X
z2Bn
w(z)Pr{C(x, y) = z}
!
=
X
w2Bn
awbw(x, y)
bw,⇢(x, y) = w(y)(1 2⇢)|(x y)^w|
(1
E{f(U1/2(x, y))} = A(0)
x,y =
X
w2Bn
|(x y)^w|=0
aw w(y). (2
¯f
Autocorrelation FDC Mutation Uniform Crossover Runtime
15. 15 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
• Can we compute the probability distribution?
Only Expected Values?
fitness
probability
Current solution
expectation
expectation
Autocorrelation FDC Mutation Uniform Crossover Runtime
As a consequence of the previous proposition we cannot ensu
evaluation of the matrix function F (x) exists in general. The compl
F (x) depends on the problem.
3.4 Fitness Probability Distribution
With the help of the moments vector µ{f(Mp(x))} we can comp
distribution of the values of f in a mutated solution. In order to do
the same way as Sutton et al. (2011a).
Let us call ⇠0 < ⇠1 < · · · < ⇠q 1 to the q possible values that the
in the search space. Since we are dealing with a finite search space,
(perhaps very large). We are interested in computing Pr{f(Mp(x))
In order to simplify the notation in the following we define the vec
⇡(f(Mp(x))) as ⇡i(f(Mp(x))) = Pr{f(Mp(x)) = ⇠i}.
Theorem 3. Let us consider the binary hypercube and let us denote with
that the objective function f can take in the search space, where ⇠i < ⇠i+
Then, the vector of probabilities ⇡(f(Mp(x))) can be computed as:
⇡(f(Mp(x))) = V T 1
F (x)
| {z }
problem-dependent
⇤(p),
where the matrix function F (x) is limited to the first q rows and V deno
j
Polynomials in p
16. 16 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
• The expected hitting time is a fraction of polynomials in p
• Optimal probability of mutation for n=2
articular instances. The disadvantage is that the expression is quite complex to
yze and we need to use numerical methods, so it is not easy to generalize the answ
btained.
Let us first start by studying the (1 + 1) EA. Taking into account the $ ma
efined in (51) for Onemax, the expected number of iterations can be exactly compu
s a function of p, the probability of flipping a bit. Just for illustration purposes
resent the expressions of such expectation for n 3:
E{⌧} =
1
2p
for n = 1,
E{⌧} =
7 5p
4(p 2)(p 1)p
for n = 2,
E{⌧} =
26p4
115p3
+ 202p2
163p + 56
8(p 1)2p (p2 3p + 3) (2p2 3p + 2)
for n = 3.
volutionary Computation Volume x, Number x
n=1
n=2
n=3
n=4
n=
p
E{t}
0.0 0.2 0.4 0.6 0.8 1
0
5
10
gure 1: Expected runtime of the (1+1) EA for Onemax as a function of the probabi
flipping a bit. Each line correspond to a different value of n from 1 to 7.
Having the exact expressions we can compute the optimal mutation probabi
r each n by using classical optimization methods in one variable. In particular,
= 1 the optimal value is p = 1 as we previously saw and for n = 2 we have to solv
ubic polynomial in order to obtain the exact expression. The result is:
p⇤
2 =
1
5
0
@6 3
s
2
23 5
p
21
3
s
23 5
p
21
2
1
A ⇡ 0.561215, (
hich is slightly higher than the recommended value p = 1/n. As we increase
nalytical responses for the optimal probability are not possible and we have to ap
Runtime of (1+1) EA: Expected Hitting Time
Autocorrelation FDC Mutation Uniform Crossover Runtime
17. 17 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Runtime of (1+1) EA: Curves
alternate between two solutions if p = 1. However, when n = 1 the probability p = 1
is valid, furthermore, is optimal, because if the global solution is not present at the
beginning we can reach it by alternating the only bit we have. In Figure 1 we show
the expected runtime as a function of the probability of flipping a bit for n = 1 to 7.
We can observe how the optimal probability (the one obtaining the minimum expected
runtime) decreases as n increases.
n=1
n=2
n=3
n=4
n=5
n=6
n=7
p
E{t}
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
Figure 1: Expected runtime of the (1+1) EA for Onemax as a function of the probability
Autocorrelation FDC Mutation Uniform Crossover Runtime
19. 19 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Landscape Explorer
RCP Architecture Procs. & Lands. URLs
• Main goals:
• Easy to extend
• Multiplatform
• RCP architecture
RCP (Rich Client Platform)
20. 20 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Rich Client Platform
• Mechanisms for plugin interaction:
• Extensions, extension points and package exports
RCP Architecture Procs. & Lands. URLs
21. 21 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Architecture
• Main plugins of the application
neo.landscapes.theory.tool.procedures
neo.landscapes.theory.tool.selectors
neo.landscapes.theory.tool neo.landscapes.theory.kernel
TSP QAP SS UQO DFA WFFAP
.landscapes
.selectors
.procedures
RCP Architecture Procs. & Lands. URLs
22. 22 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Included Procedures and Landscapes
• Procedures
• Elementary Landscape Check
• Mathematica program to get the Elementary Landscape Decomposition
• Computation of the reduced adjacency matrix
• Theoretical Autocorrelation Measures
• Experimental Autocorrelation
• Estimation of the number of elementary components
• Landscapes
• QAP (Quadratic Assignment Problem)
• UQO (Unconstrained Quadratic Optimization)
• TSP (Traveling Salesman Problem)
• Walsh Functions (linear combinations)
• Subset Sum Problem
RCP Architecture Procs. & Lands. URLs
23. 23 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
Available on the Internet
http://neo.lcc.uma.es/software/landexplorer
RCP Architecture Procs. & Lands. URLs
24. 24 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
On-line Computation of Autocorrelation for QAP
http://neo.lcc.uma.es/software/qap.php
RCP Architecture Procs. & Lands. URLs
25. 25 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Landscape Theory Relevant Results Software Tools
Conclusions
& Future Work
r =
f[p](x⇤
)
f
p
n
r = 0
f(x) = f[0](x) + f[1](x) + f[2](x) + . . . + f[n](x)
a1 = 7061.43
a2 = a3 = . . . = a45 = 1
R(A, f) = (f)
|{z}
problem
⌦ ⇤(A)
| {z }
algorithm
Conclusions & Future Work
Conclusions
• Landscape Theory is very good for providing statitical information at a
low cost
• FDC, expected fitness value after mutation and uniform crossover
• Runtime?
• Software Tools have been developed to help non-experts to use the
knowledge
Future Work
26. 26 / 26July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013
Thanks for your attention !!!
Problem Understanding through Landscape Theory