3. The problem
• You need to tune something
• a database
• a search engine
• a machine learning solution
• …
2
4. The problem
• You need to tune something
• a database
• a search engine
• a machine learning solution
• …
• Getting good results is important
• but there’s lots of values to tune
• the effects of tuning them are hard to predict
2
10. How to solve
1. Formulate the problem clearly
2. Measure results properly
6
11. How to solve
1. Formulate the problem clearly
2. Measure results properly
3. Then
• try to understand the problem in depth, and/or
• let your computer find a good solution
6
13. Our kind of problem
• If
• all your knobs are numeric and
• you can measure how good a given set of settings is
7
14. Our kind of problem
• If
• all your knobs are numeric and
• you can measure how good a given set of settings is
• then
• basically you’re trying to find the highest point in a many-
dimensional space
7
15. Our kind of problem
• If
• all your knobs are numeric and
• you can measure how good a given set of settings is
• then
• basically you’re trying to find the highest point in a many-
dimensional space
• One dimension per knob + 1 dimension for
evaluation function
7
19. A warning!
• Be very, very careful about the evaluation function
• Your algorithm will produce a good value for the
evaluation function
• If the function matches poorly with what you
actually need, you’re going to work hard to produce
something bad …
9
21. Genetic algorithm
• The “original” nature-inspired algorithm (1960s)
• make n random solutions
• evaluate them, throw away the worst, duplicate the best
• add random newcomers
• make random changes, repeat
10
22. Genetic algorithm
• The “original” nature-inspired algorithm (1960s)
• make n random solutions
• evaluate them, throw away the worst, duplicate the best
• add random newcomers
• make random changes, repeat
10
Exploit
23. Genetic algorithm
• The “original” nature-inspired algorithm (1960s)
• make n random solutions
• evaluate them, throw away the worst, duplicate the best
• add random newcomers
• make random changes, repeat
10
Exploit
Explore
24. Genetic algorithm
• The “original” nature-inspired algorithm (1960s)
• make n random solutions
• evaluate them, throw away the worst, duplicate the best
• add random newcomers
• make random changes, repeat
• Weakness: can’t exploit structure of numeric problems
• no sense of hyperspace
10
Exploit
Explore
25. Genetic algorithm
• The “original” nature-inspired algorithm (1960s)
• make n random solutions
• evaluate them, throw away the worst, duplicate the best
• add random newcomers
• make random changes, repeat
• Weakness: can’t exploit structure of numeric problems
• no sense of hyperspace
• Strength: can solve non-numeric problems
• can even write code
10
Exploit
Explore
26. Particle-swarm
optimization (1995)
• A swarm of particles explore the search space
together
• move around semi-randomly
• communicate about what they’ve seen
• particles attracted toward high spots in the landscape
11
27. PSO, initialization
• Use 10 + int(2 * math.sqrt(dimensions))
particles
• Position each particle randomly
• Give each particle a random velocity
12
29. PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
13
30. PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
13
31. PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
• random factor * (best position - current position)
13
32. PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
• random factor * (best position - current position)
• random factor * (best neighbour position - current position)
13
33. PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
• random factor * (best position - current position)
• random factor * (best neighbour position - current position)
• Velocity tends to decrease as
13
34. PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
• random factor * (best position - current position)
• random factor * (best neighbour position - current position)
• Velocity tends to decrease as
• current position goes toward best position and
13
35. PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
• random factor * (best position - current position)
• random factor * (best neighbour position - current position)
• Velocity tends to decrease as
• current position goes toward best position and
• best and best neighbour position converge
13
36. Code
def f1(x):
return 1 + math.sin(2 * np.pi * x)
def f2a(x):
return x ** 3 - 2 * x ** 2 + 1 * x - 1
def f(x):
return f1(x) + f2a(x)
swarm = pso.Swarm(
dimensions = [(0.0, 2.0)],
fitness = lambda x: f(x[0]),
particles = 5
)
for ix in range(20):
swarm.iterate()
print swarm.get_best_ever()
14
37. Implementation
class Particle:
def __init__(self, dimensions, fitness):
self._dimensions = dimensions
self._fitness = fitness
self._vel = [pick_velocity(min, max) for (min, max) in dimensions]
def iterate(self):
for ix in range(len(self._dimensions)):
self._vel[ix] = (
self._vel[ix] * w +
random.uniform(0, c) * (self._prev_best_pos[ix] - self._pos[ix]) +
random.uniform(0, c) * (self.neighbourhood_best(ix) - self._pos[ix])
)
self._pos[ix] += self._vel[ix]
self._constrain(ix)
self._update()
15
63. Dedup with PSO
• A 27-dimension problem
• Really difficult to solve optimally
• Takes a long time to evaluate solutions
• No idea what the best possible solution actually is
22
71. Firefly algorithm
• Fireflies are positioned randomly
• On each iteration, each firefly jumps toward every other
firefly based on how bright it looks
• that is, the brighter the other firefly appears, the further toward it
our firefly jumps
• it only jumps toward fireflies that are brighter than itself
28
Xin-She Yang, 2010
72. Firefly algorithm
• Fireflies are positioned randomly
• On each iteration, each firefly jumps toward every other
firefly based on how bright it looks
• that is, the brighter the other firefly appears, the further toward it
our firefly jumps
• it only jumps toward fireflies that are brighter than itself
• Each firefly shines brighter the better its fitness is
• but attractiveness falls off with square of the distance
28
Xin-She Yang, 2010
73. Firefly algorithm
• Fireflies are positioned randomly
• On each iteration, each firefly jumps toward every other
firefly based on how bright it looks
• that is, the brighter the other firefly appears, the further toward it
our firefly jumps
• it only jumps toward fireflies that are brighter than itself
• Each firefly shines brighter the better its fitness is
• but attractiveness falls off with square of the distance
• Add random jiggling
28
Xin-She Yang, 2010
74. How it works
• The best firefly always stands still, pulling the
others toward the best result
• Bad fireflies get pulled in all directions, but good
fireflies get pulled much less
• this is exploit vs explore
• Pull diminishes with distance, so the fireflies don’t
necessarily all explore the same best position
• in order to explore more local maxima
29
75. Firefly code
def iterate(self):
for firefly in self._swarm.get_particles():
if self._val < firefly._val:
dist = self.distance(firefly)
attract = firefly._val / (1 + gamma * (dist ** 2))
for ix in range(len(self._dimensions)):
jiggle = alpha * (random.uniform(0, 1) - 0.5)
diff = firefly._pos[ix] - self._pos[ix]
self._pos[ix] = self._pos[ix] + jiggle + (attract * diff)
self._constrain(ix)
30
78. Cuckoo search
• Very similar to genetic algorithm
• take candidate, modify it
• if better than existing candidate, replace
• Every generation, discard some proportion of
candidates
• fill up with random new ones
• The difference is in how new results are produced
• using Lévy flights
33
Yang et al. 2010
80. How it works
• Balance explore and exploit with Lévy flights
• usually jump short, sometimes jump long
• Never replace good candidates by bad
• but always throw away the n worst
35
101. Morale
• These are stochastic algorithms
• one evaluation doesn’t tell you much about the algorithm
• even 10 evaluations isn’t enough
• Be careful here, or you can fool yourself!
40
102. It’s not that simple
• Which PSO?
• SPSO 2006, 2007, or 2011?
• What values to use for decay factor and randomization?
• What neighbourhood topology to use?
• Firefly has parameters alpha, beta, and gamma
• Cuckoo has alpha and scale
41
104. So … in order to tune our algorithm we need to
tune the algorithm that tunes our algorithm
105. Problems
• Doing one run of Cuckoo search takes ~30 minutes
• Need to do that ~40 times to get a decent estimate
• My laptop was already getting uncomfortably hot
• My wife was complaining about the fan noise
• What to do?
44
107. Master algorithm
• Run PSO on Cuckoo alpha & scale
• Fitness function
• sets the task handed out by /get-task
• hangs until 20 evaluations have come in via /answer
• returns average of evaluations
46
112. More algorithms
• Flower-pollination algorithm [Xin-She Yang 2013]
• Bat-inspired algorithm [Xin-She Yang, 2010]
• Ant colony algorithm [Marco Dorigo, 1992]
• Bee algorithm [Pham et al 2005]
• Fish school search [Filho et al 2007]
• Artificial bee colony algorithm [Karaboga et al 2005]
• …
50
Looks interesting,
hard to find algorithm
Paper too vague,
can’t implement
Complicated,
didn’t finish implementation
113. 51
PROPERTY COMPARATOR LOW HIGH
Name LongestCommonSubstring 0.35 0.88
Address1 WeightedLevenshtein 0.25 0.65
Address2 Levenshtein 0.5 0.6
Email Exact 0.49 0.51
Phone Exact 0.45 0.65
Geopos Geoposition 0.25 0.6
Region Exact 0.0 0.5
Threshold: 0.74
But what about these?
114. The machine learning way
52
PROPERTY COMPARATOR LOW HIGH
Name c1 c1 0.35 0.88
Name c2 c2 0.25 0.65
Name c3 c3 0.5 0.6
Name c4 c4 0.49 0.51
… … 0.45 0.65
Address1 c1 c1 0.25 0.6
Address1 c2 c2 0.0 0.5
Address1 c3 c3 … …
Address1 c4 c4 … …
… … … …
115. Opens a door
• Means we can drop the probabilities, and just use the
numeric values coming out of the comparators
• Feed into one of
• random forests
• Support Vector Machine (SVM)
• logistic regression
• neural networks
• …
• Except that’s no longer optimization, but attacking the
problem directly, so let’s stick with our algorithms
53
116. General trick
• Quite common to “cheat” this way to use numeric-
only machine learning algorithms
• Turn boolean into [0, 1] parameter
• Turn enumeration into one boolean per value
• Looks odd, but in general it does work
• of course, now there isn’t much spatial structure any more
54
118. Tricky, tricky
• Our problem now has 267 dimensions
• should allow us to tune for really detailed signals
55
119. Tricky, tricky
• Our problem now has 267 dimensions
• should allow us to tune for really detailed signals
• The curse of dimensionality
• everywhere is pretty much equally far away from
everywhere else
• hyperspace consists of all corners and no middle
• many of the dimensions contain no signal
55
127. What to choose?
• PSO
• generally performs best
• dead easy to implement
• parameters available in the literature
• no need to scale to coordinate system used
• SPSO 2007
• values for w and c are given
• ring topology: each particle knows p-1, p, p+1
63
128. Simulated annealing
• Inspired by the behaviour of cooling metals
• guaranteed to eventually find maximum
• no guarantee that time taken will be reasonable
• To use, requires following parameters
• Candidate neighbour generation procedure
• Acceptance probability function
• Annealing schedule
• Initial temperature
• May work better than PSO and friends, but also requires a lot
more effort to set up
64
129. Choices, choices
• There are always more advanced methods
• sometimes they work much better, sometimes not
• PSO gives you a dead easy place to start
• whip it up in a few lines of code
• see how it works
• vastly better than random trial and error
• adapts nicely to all kinds of problems without tuning
65
130. Research papers
• Publishing standards are probably too lax
• Algorithm descriptions are weak
• far too little information about tuning parameters
• no code available
• Evaluation sections are weak
• only one evaluation metric
• no information about how PSO/GA were tuned
• no information about how proposed algorithm was tuned
• no cross-comparison with other algorithms
66
131. See for yourself
• https://github.com/larsga/py-snippets/tree/master/
machine-learning/pso
• links to all the papers
• code for crap, genetic, pso, firefly, cuckoo
• bonus: cuckoo2, server
• also has the test functions
• Total number of experiments: 24,008
• means evaluating fitness 2,400,800 times
67