GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

Bounding XCS’s Parameters
for U b l
f Unbalanced Datasets
dD t t

Albert Orriols-Puig
Ester Bernadó-Mansilla

Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle
Ramon Llull University
Barcelona, Spain
,p

Framework

New instance

Information based Knowledge
on experience extraction

Learner Model
Mdl
Dataset

Predicted Output
Examples

Consisting Cou te e a p es
Counter-examples
of

, yp y
In real-world domains, typically:
Higher cost to obtain examples of the concept to be learnt
So, distribution of examples in the training dataset is usually unbalanced

Applications:
Fraud Detection
Rare medical diagnosis
Detection of oil spills in satellite images
Enginyeria i Arquitectura la Salle Slide 2
GRSI

Framework

Do learners suffer from class imbalances?

Training Minimize the
Learner
L
Set global error

num. errorsc1 + num. errorsc 2
error =
Biased towards
number examples
the overwhelmed class

Maximization of the overwhelmed class accuracy,
in detriment of the minority class.

GRSI

Aim

Analyze the performance of XCS when
learning from imbalanced datasets

Analyze the contribution of the
different components

Propose approaches that facilitate to learn
P h th t f ilit t t l
minority class regions

GRSI

Outline

1. Description of XCS

2. Description of the Domain

3. Experimentation
3E i t ti

4.
4 XCS and Class Imbalances

5. Guidelines for Parameter Tuning

6. Online Adaptation

7. Conclusions

GRSI

2. Description of the domain

p 3. Experimentation
4. XCS and class imbalances
5. Guidelines for P. Tuning
g
7. Conclusions

In single-step tasks:
g p

Environment

Match Set [M]
Problem
instance
1C A PεF num as ts exp
Selected
action
Population [P] 6C A PεF num as ts exp
Match set
REWARD
…
generation
Prediction Array
…
c1 c2 cn
6C A PεF num as ts exp Random Action
…
Action S t
A ti Set [A]
Deletion
Classifier
Selection, Reproduction,
Parameters
Mutation
Update
…
Genetic Algorithm

GRSI


g
7. Conclusions

Learning domain

Environment

Reward
Prediction

Set of Rules

Reinforcement
GA Learning
R ti b t
Ratio between classes 525 75
l 525:75

1 minority class example
7 majority class examples
j y p

GRSI


2. Description of the Domain
g
7. Conclusions

Selection
bits
(11-bit) Multiplexer Imbalanced Multiplexer
Position
bits

Example: 000 10010100:1

Co p e y e a ed o e
Complexity related to the
•We under-sampled class 1
number of selection bits
Completely balanced
•ir: Proportion between majority and
ir:
XCS should evolve: minority class instances
000 0#######:0 000 0#######:1 000 1#######:0 000 1#######:1
•i: imbalance level (i=log2ir)
001 #0######:0 001 #0######:1 001 #1######:0 001 #1######:1

010 ##0#####:0 010 ##0#####:1 010 ##1#####:0 010 ##1#####:1

011 ###0####:0 011 ###0####:1 011 ###1####:0 011 ###1####:1
100 ####0###:0 100 ####0###:1 100 ####1###:0 100 ####1###:1

101 #####0##:0 101 #####0##:1 101 #####1##:0 101 #####1##:1

110 ######0#:0 110 ######0#:1 110 ######1#:0 110 ######1#:1

111 #######0:0 111 #######0:1 111 #######1:0 111 #######1:1

GRSI


3. Experimentation
g
7. Conclusions

We ran XCS with the following standard configuration from
i=0 (ir=1) to i=9 (ir=512:1):

N=800, α=0.1, ν=5, Rmax = 1000, ε0=1, θGA=25, β=0.2,
χ=0.8, μ=0.4, θdel=20, δ=0.1, θsub=200, P#=0.6
selection=rws, mutation=niched,
selection=rws mutation=niched
GAsub=true, [A]sub=false

GRSI


3. Experimentation
g
7. Conclusions

True Negative rate True Positive rate

ir = 32:1
ir 64:1
i = 64 1
ir 16:1
i = 16 1

GRSI


3. Experimentation
g
7. Conclusions

Most numerous rules, ir=128:1

Condition:Action P Error F Num

###########:0 1000 0.120 0.98 385

###########:1 1.2 · 10-4 0.074 0.98 366

Estimated parameters
are too high. Theoretically:
P:0 = 992.24 P:1 = 15.38
ε0:0 = ε0:1 = 7.75

Overgeneral classifiers
overtake the population
(they represent the 94%
of the population)

GRSI


4. XCS and Class Imbalances 3. Experimentation
g
7. Conclusions

We analyze the following factors:

Classifiers
Classifiers’ Error

y
Stability of Prediction and Error Estimates

Occurrence-based Reproduction

GRSI


4. XCS and Class Imbalances 2. Description of the domain
3. Experimentation

4.1. Classifiers
4 1 Classifiers’ Error 5. Guidelines for P. Tuning
g
7. Conclusions

How does the imbalance ratio influences the classifier’s error?
ε cl < ε 0
XCS considers that a classifier is accurate if:
XCS receives a reward of Rmax (correct prediction) or 0 (incorrect prediction)
XCS computes classifiers’ error (ε) and prediction (p) as window
averages:
Prediction: pt +1 = pt + β (R − pt )
• P di ti
ε t +1 = ε t + β ( R − pt − ε t )
• Error:

GRSI


3. Experimentation

4 1 Classifiers’ Error
4.1. Classifiers 5. Guidelines for P. Tuning
g
7. Conclusions

Until which class imbalance will XCS detect overgeneral
classifiers?
– Bound for inaccurate classifier: ε ≥ ε 0
– Given the estimated prediction and error:
detected
P = Pc (cl ) Rmax + (1 − Pc (cl )) Rmin
ε =| P − Rmax | Pc (cl )+ | P − Rmin | (1 − Pc (cl ))
l l
– We derive:
ε ≥ ε0
− ε o p + 2 p( Rmax − ε 0 ) − ε 0 ≥ 0
2

– 1/1998 1998
where not detected

p =!C / C
– For

Rmax = 1000 ε 0 = 1
– we get maximum imbalance ratio:0 ) − ε 0 ≥ 0
− ε o p + 2 p( Rmax − ε
2
irmax = 1998
irmax = 1998 imax = 10
imax = 10

GRSI


3. Experimentation

4 1 Classifiers’ Error
4.1. Classifiers 5. Guidelines for P. Tuning
g
7. Conclusions

XCS computes classifiers’ error (ε) and prediction (p) as
window averages:
– Prediction: pt +1 = pt + β (R − pt )
Size of the window

ε t +1 = ε t + β ( R − pt − ε t )
– Error:
eward
Influen of the re

β=0.2 The effect of previous
rewards is forgotten
nce

β=0.1

β=0.05

t+2
t+1 t+3 t+4 t+5 t+6 t+7 t+8
time
GRSI


3. Experimentation

4 2 Stability of Prediction and Error Estimates
4.2. 5. Guidelines for P. Tuning
g
7. Conclusions

Stability of Prediction and Error f ir=128:1
S f for
7.75 992.24

0.4

0.8
0.3
.3

0.6
β = 0.2

Density

Density
0.2

0.4
D
0.1
.1

0.2
0.0

0.0
As ir 20 40 60 80 should be decreased
increases β 100
increases,
0
900 920 940 960 980 1000
to stabilize the prediction and error estimates 992.24
7.75 Error
Prediction
0.12

0.00 0.05 0.10 0.15 0.20
β = 0.002

0.08

Density
Density

0.04
.04
D

5
0.00

900 920 940 960 980 1000
0 20 40 60 80 100

Prediction
Error

GRSI


3. Experimentation

4 3 Occurrence based Reproduction
4.3. Occurrence-based 5. Guidelines for P. Tuning
g
7. Conclusions

To receive a GA event a classifier has to belong to [A]
event,
Frequency of occurrences
Classifier pocc 11-Mux ir=128:0 0.5

000 0#######:0 0.062
1 ir 0.4 000 0#######:0
pocc = 000 1#######:1
2 sel +1 1 + i ### ########:0/1
ir
0.3
1 1
000 1#######:1 0.000484
pocc =
2 sel +1 1 + ir 0.2

### ########:0 ½ 0.5 0.1

### ########:1 ½ 0.5 0
0 100 200 300 400 500
ir
Classifiers that occur more frequently:
– Have better estimates
– Tend to have more genetic opportunities…
… depending on θGA

GRSI


3. Experimentation

4 3 Occurrence based Reproduction
4.3. Occurrence-based 5. Guidelines for P. Tuning
g
7. Conclusions

Genetic opportunities
– A classifier goes through a genetic event when (TGA):
• It occurs in [A]
• Average time since last GA application > θGA

TGA(########### 0/1)
(###########:0/1)

GA GA
GA GA
Tocc

θGA 75 θGA 100
θGA 25 θGA 50

Set θGA = Tocc of the most infrequent niche
To balance the genetic opportunities
that receive the different niches
T (0001#######:1)
GA

GA
Tocc

θGA

GRSI


g 3. Experimentation
g
7. Conclusions

From the analysis we can extract the following guidelines
Rmax and ε0 determine the threshold between negligible noise and
gg
imbalance ratio

β represents the reward f
t th d forgetfulness ratio. We want this ratio to
tf l ti W t thi ti t
consider under-sampled instances:
f min
β = k1 i
f maj

θGA is the GA rate when Tocc < θGA. If we want that all niches receive
the same number of genetic opportunities:
1
θ GA = k 2
f min

GRSI


g 3. Experimentation
g
7. Conclusions

We set β={0.04,0.02,0.01,0.005} and θGA={200,400,800,800,1600}
Standard Configuration Configuration following the guidelines

ir = 16:1 ir = 32:1 ir = 64:1 ir = 128:1 ir = 256:1
ir = 64:1

GRSI


g
7. Conclusions

Problem: How can we estimate the niche frequency?
f maj
f min =
– In the multiplexer:
ir
– In a real world problem
real-world problem…
… niche frequencies may not be related to imbalance ratio

small disjuncts

ir = 5 in both figures

GRSI


g
7. Conclusions

Our approach: Let XCS discover small disjuncts.

We search for regions that promote overgeneral classifiers

We estimate ircl based on that regions

We use ircl to adapt β and θGA Overgeneral classifier
ircl = 14:1

GRSI


g
7. Conclusions

The Algorithm

Checking if prediction oscillates

Estimating the imbalance ratio

Requiring a minimum of
experience and numerosity
to adapt the parameters

Adapting parameters
following the guidelines and
the estimation of θGA

GRSI


g
7. Conclusions

Configuration following the guidelines
Standard Configuration Online Adaptation

ir = 16:1 irir==128:1
32:1 ir ir = 64:1
= 256:1
ir = 64:1
ir = 128:1 ir = 256:1
ir = 64:1

GRSI


7. Conclusions 3. Experimentation
g
7. Conclusions

We studied the behavior of XCS when the training set is
unbalanced
XCS with standard configuration only can solve the multiplexer
for an imbalance ratio up to ir=16
p
The theoretical analysis denotes that XCS is highly robust to
class imbalances if:
– Classifier estimates are accurate
– N b of genetic opportunities of niches i b l
Number f ti t iti f i h is balanced
d

We define guidelines to adapt XCS’s parameters:
– XCS could solve the multiplexer until an imbalance ratio ir=256

GRSI


7. Conclusions 3. Experimentation
g
7. Conclusions

As an advantage to other learners, XCS can automatically
discover small disjuncts:

Self-adaptation
of parameters

GRSI


7. Further Work 3. Experimentation
g
7. Conclusions

What about the convergence time?
– An increase θGA A decrease of search for promising rules
p g

Cluster-based resampling methods…
… unfortunately, there is no a direct relation between cluster and niche

What about niche-based resampling?

ir
i niche = 14 1
14:1

Let s
Let’s resample
these instances 1/irniche

GRSI

GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

Similar to GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets (20)

More from Albert Orriols-Puig

More from Albert Orriols-Puig (20)

Recently uploaded

Recently uploaded (20)

GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets