GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
1. Bounding XCS’s Parameters
for U b l
f Unbalanced Datasets
dD t t
Albert Orriols-Puig
Ester Bernadó-Mansilla
Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle
Ramon Llull University
Barcelona, Spain
,p
2. Framework
New instance
Information based Knowledge
on experience extraction
Learner Model
Mdl
Dataset
Predicted Output
Examples
Consisting Cou te e a p es
Counter-examples
of
, yp y
In real-world domains, typically:
Higher cost to obtain examples of the concept to be learnt
So, distribution of examples in the training dataset is usually unbalanced
Applications:
Fraud Detection
Rare medical diagnosis
Detection of oil spills in satellite images
Enginyeria i Arquitectura la Salle Slide 2
GRSI
3. Framework
Do learners suffer from class imbalances?
Training Minimize the
Learner
L
Set global error
num. errorsc1 + num. errorsc 2
error =
Biased towards
number examples
the overwhelmed class
Maximization of the overwhelmed class accuracy,
in detriment of the minority class.
Enginyeria i Arquitectura la Salle Slide 3
GRSI
4. Aim
Analyze the performance of XCS when
learning from imbalanced datasets
Analyze the contribution of the
different components
Propose approaches that facilitate to learn
P h th t f ilit t t l
minority class regions
Enginyeria i Arquitectura la Salle Slide 4
GRSI
5. Outline
1. Description of XCS
2. Description of the Domain
3. Experimentation
3E i t ti
4.
4 XCS and Class Imbalances
5. Guidelines for Parameter Tuning
6. Online Adaptation
7. Conclusions
Enginyeria i Arquitectura la Salle Slide 5
GRSI
6. 1. Description of XCS
2. Description of the domain
1. Description of XCS
p 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
In single-step tasks:
g p
Environment
Match Set [M]
Problem
instance
1C A PεF num as ts exp
Selected
3C A PεF num as ts exp
action
5C A PεF num as ts exp
Population [P] 6C A PεF num as ts exp
Match set
REWARD
…
generation
1C A PεF num as ts exp
Prediction Array
2C A PεF num as ts exp
3C A PεF num as ts exp
…
c1 c2 cn
4C A PεF num as ts exp
5C A PεF num as ts exp
6C A PεF num as ts exp Random Action
…
Action S t
A ti Set [A]
1C A PεF num as ts exp
Deletion
Classifier
3C A PεF num as ts exp
Selection, Reproduction,
Parameters
Mutation
5C A PεF num as ts exp
Update
6C A PεF num as ts exp
…
Genetic Algorithm
Enginyeria i Arquitectura la Salle Slide 6
GRSI
7. 1. Description of XCS
2. Description of the domain
1. Description of XCS
p 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
Learning domain
Environment
Reward
Prediction
Set of Rules
Reinforcement
GA Learning
R ti b t
Ratio between classes 525 75
l 525:75
1 minority class example
7 majority class examples
j y p
Enginyeria i Arquitectura la Salle Slide 7
GRSI
8. 1. Description of XCS
2. Description of the domain
2. Description of the Domain
p 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
Selection
bits
(11-bit) Multiplexer Imbalanced Multiplexer
Position
bits
Example: 000 10010100:1
Co p e y e a ed o e
Complexity related to the
•We under-sampled class 1
number of selection bits
Completely balanced
•ir: Proportion between majority and
ir:
XCS should evolve: minority class instances
000 0#######:0 000 0#######:1 000 1#######:0 000 1#######:1
•i: imbalance level (i=log2ir)
001 #0######:0 001 #0######:1 001 #1######:0 001 #1######:1
010 ##0#####:0 010 ##0#####:1 010 ##1#####:0 010 ##1#####:1
011 ###0####:0 011 ###0####:1 011 ###1####:0 011 ###1####:1
100 ####0###:0 100 ####0###:1 100 ####1###:0 100 ####1###:1
101 #####0##:0 101 #####0##:1 101 #####1##:0 101 #####1##:1
110 ######0#:0 110 ######0#:1 110 ######1#:0 110 ######1#:1
111 #######0:0 111 #######0:1 111 #######1:0 111 #######1:1
Enginyeria i Arquitectura la Salle Slide 8
GRSI
9. 1. Description of XCS
2. Description of the domain
3. Experimentation
p 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
We ran XCS with the following standard configuration from
i=0 (ir=1) to i=9 (ir=512:1):
N=800, α=0.1, ν=5, Rmax = 1000, ε0=1, θGA=25, β=0.2,
χ=0.8, μ=0.4, θdel=20, δ=0.1, θsub=200, P#=0.6
selection=rws, mutation=niched,
selection=rws mutation=niched
GAsub=true, [A]sub=false
Enginyeria i Arquitectura la Salle Slide 9
GRSI
10. 1. Description of XCS
2. Description of the domain
3. Experimentation
p 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
True Negative rate True Positive rate
ir = 32:1
ir 64:1
i = 64 1
ir 16:1
i = 16 1
Enginyeria i Arquitectura la Salle Slide 10
GRSI
11. 1. Description of XCS
2. Description of the domain
3. Experimentation
p 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
Most numerous rules, ir=128:1
Condition:Action P Error F Num
###########:0 1000 0.120 0.98 385
###########:1 1.2 · 10-4 0.074 0.98 366
Estimated parameters
are too high. Theoretically:
P:0 = 992.24 P:1 = 15.38
ε0:0 = ε0:1 = 7.75
Overgeneral classifiers
overtake the population
(they represent the 94%
of the population)
Enginyeria i Arquitectura la Salle Slide 11
GRSI
12. 1. Description of XCS
2. Description of the domain
4. XCS and Class Imbalances 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
We analyze the following factors:
Classifiers
Classifiers’ Error
y
Stability of Prediction and Error Estimates
Occurrence-based Reproduction
Enginyeria i Arquitectura la Salle Slide 12
GRSI
13. 1. Description of XCS
4. XCS and Class Imbalances 2. Description of the domain
3. Experimentation
4. XCS and class imbalances
4.1. Classifiers
4 1 Classifiers’ Error 5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
How does the imbalance ratio influences the classifier’s error?
ε cl < ε 0
XCS considers that a classifier is accurate if:
XCS receives a reward of Rmax (correct prediction) or 0 (incorrect prediction)
XCS computes classifiers’ error (ε) and prediction (p) as window
averages:
Prediction: pt +1 = pt + β (R − pt )
• P di ti
ε t +1 = ε t + β ( R − pt − ε t )
• Error:
Enginyeria i Arquitectura la Salle Slide 13
GRSI
14. 1. Description of XCS
4. XCS and Class Imbalances 2. Description of the domain
3. Experimentation
4. XCS and class imbalances
4 1 Classifiers’ Error
4.1. Classifiers 5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
Until which class imbalance will XCS detect overgeneral
classifiers?
– Bound for inaccurate classifier: ε ≥ ε 0
Overgeneral classifiers
– Given the estimated prediction and error:
detected
P = Pc (cl ) Rmax + (1 − Pc (cl )) Rmin
ε =| P − Rmax | Pc (cl )+ | P − Rmin | (1 − Pc (cl ))
l l
– We derive:
ε ≥ ε0
− ε o p + 2 p( Rmax − ε 0 ) − ε 0 ≥ 0
2
Overgeneral classifiers
– 1/1998 1998
where not detected
p =!C / C
– For
Rmax = 1000 ε 0 = 1
– we get maximum imbalance ratio:0 ) − ε 0 ≥ 0
− ε o p + 2 p( Rmax − ε
2
irmax = 1998
irmax = 1998 imax = 10
imax = 10
Enginyeria i Arquitectura la Salle Slide 14
GRSI
15. 1. Description of XCS
4. XCS and Class Imbalances 2. Description of the domain
3. Experimentation
4. XCS and class imbalances
4 1 Classifiers’ Error
4.1. Classifiers 5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
XCS computes classifiers’ error (ε) and prediction (p) as
window averages:
– Prediction: pt +1 = pt + β (R − pt )
Size of the window
ε t +1 = ε t + β ( R − pt − ε t )
– Error:
eward
Influen of the re
β=0.2 The effect of previous
rewards is forgotten
nce
β=0.1
β=0.05
t+2
t+1 t+3 t+4 t+5 t+6 t+7 t+8
time
Enginyeria i Arquitectura la Salle Slide 15
GRSI
16. 1. Description of XCS
4. XCS and Class Imbalances 2. Description of the domain
3. Experimentation
4. XCS and class imbalances
4 2 Stability of Prediction and Error Estimates
4.2. 5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
Stability of Prediction and Error f ir=128:1
S f for
7.75 992.24
0.4
0.8
0.3
.3
0.6
β = 0.2
Density
Density
0.2
0.4
D
0.1
.1
0.2
0.0
0.0
As ir 20 40 60 80 should be decreased
increases β 100
increases,
0
900 920 940 960 980 1000
to stabilize the prediction and error estimates 992.24
7.75 Error
Prediction
0.12
0.00 0.05 0.10 0.15 0.20
β = 0.002
0.08
Density
Density
0.04
.04
D
5
0.00
900 920 940 960 980 1000
0 20 40 60 80 100
Prediction
Error
Enginyeria i Arquitectura la Salle Slide 16
GRSI
17. 1. Description of XCS
4. XCS and Class Imbalances 2. Description of the domain
3. Experimentation
4. XCS and class imbalances
4 3 Occurrence based Reproduction
4.3. Occurrence-based 5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
To receive a GA event a classifier has to belong to [A]
event,
Frequency of occurrences
Classifier pocc 11-Mux ir=128:0 0.5
000 0#######:0 0.062
1 ir 0.4 000 0#######:0
pocc = 000 1#######:1
2 sel +1 1 + i ### ########:0/1
ir
0.3
1 1
000 1#######:1 0.000484
pocc =
2 sel +1 1 + ir 0.2
### ########:0 ½ 0.5 0.1
### ########:1 ½ 0.5 0
0 100 200 300 400 500
ir
Classifiers that occur more frequently:
– Have better estimates
– Tend to have more genetic opportunities…
… depending on θGA
Enginyeria i Arquitectura la Salle Slide 17
GRSI
18. 1. Description of XCS
4. XCS and Class Imbalances 2. Description of the domain
3. Experimentation
4. XCS and class imbalances
4 3 Occurrence based Reproduction
4.3. Occurrence-based 5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
Genetic opportunities
– A classifier goes through a genetic event when (TGA):
• It occurs in [A]
• Average time since last GA application > θGA
TGA(########### 0/1)
(###########:0/1)
GA GA
GA GA
Tocc
θGA 75 θGA 100
θGA 25 θGA 50
Set θGA = Tocc of the most infrequent niche
To balance the genetic opportunities
that receive the different niches
T (0001#######:1)
GA
GA
Tocc
θGA
Enginyeria i Arquitectura la Salle Slide 18
GRSI
19. 1. Description of XCS
2. Description of the domain
5. Guidelines for Parameter Tuning
g 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
From the analysis we can extract the following guidelines
Rmax and ε0 determine the threshold between negligible noise and
gg
imbalance ratio
β represents the reward f
t th d forgetfulness ratio. We want this ratio to
tf l ti W t thi ti t
consider under-sampled instances:
f min
β = k1 i
f maj
θGA is the GA rate when Tocc < θGA. If we want that all niches receive
the same number of genetic opportunities:
1
θ GA = k 2
f min
Enginyeria i Arquitectura la Salle Slide 19
GRSI
20. 1. Description of XCS
2. Description of the domain
5. Guidelines for Parameter Tuning
g 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
We set β={0.04,0.02,0.01,0.005} and θGA={200,400,800,800,1600}
Standard Configuration Configuration following the guidelines
ir = 16:1 ir = 32:1 ir = 64:1 ir = 128:1 ir = 256:1
ir = 64:1
Enginyeria i Arquitectura la Salle Slide 20
GRSI
21. 1. Description of XCS
2. Description of the domain
6. Online Adaptation
p 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
Problem: How can we estimate the niche frequency?
f maj
f min =
– In the multiplexer:
ir
– In a real world problem
real-world problem…
… niche frequencies may not be related to imbalance ratio
small disjuncts
ir = 5 in both figures
Enginyeria i Arquitectura la Salle Slide 21
GRSI
22. 1. Description of XCS
2. Description of the domain
6. Online Adaptation
p 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
Our approach: Let XCS discover small disjuncts.
We search for regions that promote overgeneral classifiers
We estimate ircl based on that regions
We use ircl to adapt β and θGA Overgeneral classifier
ircl = 14:1
Enginyeria i Arquitectura la Salle Slide 22
GRSI
23. 1. Description of XCS
2. Description of the domain
6. Online Adaptation
p 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
The Algorithm
Checking if prediction oscillates
Estimating the imbalance ratio
Requiring a minimum of
experience and numerosity
to adapt the parameters
Adapting parameters
following the guidelines and
the estimation of θGA
Enginyeria i Arquitectura la Salle Slide 23
GRSI
24. 1. Description of XCS
2. Description of the domain
6. Online Adaptation
p 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
Configuration following the guidelines
Standard Configuration Online Adaptation
ir = 16:1 irir==128:1
32:1 ir ir = 64:1
= 256:1
ir = 64:1
ir = 128:1 ir = 256:1
ir = 64:1
Enginyeria i Arquitectura la Salle Slide 24
GRSI
25. 1. Description of XCS
2. Description of the domain
7. Conclusions 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
We studied the behavior of XCS when the training set is
unbalanced
XCS with standard configuration only can solve the multiplexer
for an imbalance ratio up to ir=16
p
The theoretical analysis denotes that XCS is highly robust to
class imbalances if:
– Classifier estimates are accurate
– N b of genetic opportunities of niches i b l
Number f ti t iti f i h is balanced
d
We define guidelines to adapt XCS’s parameters:
– XCS could solve the multiplexer until an imbalance ratio ir=256
Enginyeria i Arquitectura la Salle Slide 25
GRSI
26. 1. Description of XCS
2. Description of the domain
7. Conclusions 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
As an advantage to other learners, XCS can automatically
discover small disjuncts:
Self-adaptation
of parameters
Enginyeria i Arquitectura la Salle Slide 26
GRSI
27. 1. Description of XCS
2. Description of the domain
7. Further Work 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
6. Online Adaptation
7. Conclusions
What about the convergence time?
– An increase θGA A decrease of search for promising rules
p g
Cluster-based resampling methods…
… unfortunately, there is no a direct relation between cluster and niche
What about niche-based resampling?
ir
i niche = 14 1
14:1
Let s
Let’s resample
these instances 1/irniche
Enginyeria i Arquitectura la Salle Slide 27
GRSI
28. Bounding XCS’s Parameters
for U b l
f Unbalanced Datasets
dD t t
Albert Orriols-Puig
Ester Bernadó-Mansilla
Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle
Ramon Llull University
Barcelona, Spain
,p