Learning Classifier Systems for Class Imbalance Problems

Learning Classifier Systems
for Class Imbalance
Problems

Ester Bernadó-Mansilla

Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull
Barcelona, Spain

Aim

Enhance the applicability of LCSs
to knowledge discovery from datasets

Classification problems
Real-world domains

Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla

Framework

model
LCS
Dataset
+
estimated
performance
• Representativity of the target
• Evolutionary pressures
concept

• Interpretability
• Geometrical complexity

• Domain of applicability
• Class imbalance

• Noise

Class Imbalance

When one class is represented by a small number of

examples, compared to other class/es.

Usually the class of that describes the circumscribed

concept (positive class) is the minority class

Where?

Rare medical diagnoses

Fraud detection

Oil spills in satellite images



Class Imbalance and Classifiers

Is there a bias towards the majority class?


Probably, because…

Most classifier schemes are trained to minimize the global error


As a result

They classify accurately the examples from the majority class

They tend to misclassify the examples of the minority class,

which are often those representing the target concept.


Measures of Performance

Confusion matrix

Prediction
A B
Actual A true positive (TP) false negative (FN)
B false positive (FP) true negative (TN)

Accuracy = (TP+TN)/(TP+FN+FP+TN)

TN rate = TN / (TN + FP)

TP rate = TP / (FN + TP)

ROC curves


The Higher Class Imbalance: the
Higher Bias?

Dataset 1 Dataset 2

concept: 15 concept: 15
counterpart: 150 counterpart: 45
ratio: 10:1 ratio: 3:1


XCS

XCS

class
input Set of
Rules

update
search

Genetic Reinforcement
Algorithms Learning

reward

Environment

Dataset


Our Approach with XCS

Bounding XCS’s parameters for unbalanced datasets


Online identification of small disjuncts


Adaptation of parameters for the discovery of small

disjuncts


XCS’s Behavior in Unbalanced
Datasets

Unbalanced 11-multiplexer problem

ir=16:1 ir=32:1 ir=64:1


XCS’s Population

Most numerous rules, ir=128:1

Classifier P Error F Num
###########:0 1000 0.12 0.98 385
1.2 10-4
###########:1 0.074 0.98 366

estimated estimated too high
high
prediction:
overgeneral error: numerosity
fitness
992.24
classifiers 15.38
7.75

Test examples are classified as belonging to the majority class


How Imbalance Affects XCS

Classifier’s error


Stability of prediction and error estimates


Occurrence-based reproduction



Classifier’s Error in Unbalanced
Datasets

Will an overgeneral classifier be detected as inaccurate if the

imbalance ratio is high?

Bound for inaccurate classifier: !quot;!0
Given the estimated prediction and error:

P = Pc (cl ) Rmax + (1 ! Pc (cl )) Rmin
quot;=| P ! Rmax | Pc (cl )+ | P ! Rmin | (1 ! Pc (cl ))
We derive:

# quot;o p 2 + 2 p ( Rmax # quot;0 )# quot;0 ! 0
where !quot;!0
p =!C / C
For
Rmax = 1000 !0 = 1
we get maximum imbalance ratio:

irmax = 1998

Prediction and Error Estimates and
Learning Rate
ir=128:1, ###########:0
Error
Prediction
β=0.2
β=0.002


Occurrence-based Reproduction

Probability of occurrence (pocc)

Given ir=maj/min:
0,6

Classifier poccB poccI
0,5
1/2 1/2
########### :0

probability of occurrence
1/2 1/2
########### :1 0,4

0,3
0000#######:0 1/32
0,2

0001#######:1 1/32 0,1

0
1 2 4 8 16 32 64 128 256

imbalance ratio
22ir
p occB 00001######:1 00000######:0
ir + 1 ###########:0 ###########:1


Occurrence-based Reproduction

Probability of reproduction (pGA)


1
pGA =
TGA
if Tocc < % GA
#% GA
where TGA $ quot;
!Tocc otherwise

With θGA=20:

GA
Tocc
…
T (# # # # # # # # # # #: 0) ! quot;
GA GA
θGA
Tocc
GA
…
T (0000# # # # # # #: 0) ! T 1
GA occ
θGA
1 Assuming non-overlapping


Guidelines for Parameter Tuning

Rmax and є0 determine the threshold between negligible noise and

imbalance ratio

β determines the size of the moving window. The window should be

high enough to allow computing examples from both classes:
f min
! =k
f maj
θGA can counterbalance the reproduction opportunities of most frequent

(majority) and least frequent niches (minority):

1
! GA = k '
f min


XCS with Parameters Tuning

XCS with parameter tuning
XCS with standard settings

ir=16:1 ir=32:1 ir=64:1 ir=64:1 ir=256:1


XCS Tuning for Real-world Datasets

How we can estimate the niche frequency?


Estimate from the ratio of majority class instances and minority

class instances

Problem:


• This may not be related to the distribution of niches in the feature
space

Take the approach to the small disjuncts problem



Online Identification of Small Disjuncts

We search for regions that promote

overgeneral classifiers

Estimate ircl based on the classifier’s

experience on each class:

exp max
ircl =
exp min

Adapt β and θGA according to ircl


ircl = 20 / 4


Online Parameter Adaptation

ir=256:1


What about UCS?

Supervised XCS:

Needs less exploration


Avoids XCS’s fitness dilemma


More robust to parameter settings


Overgeneral classifiers also tend to overcome the

population
Their probability of occurrence depends on the imbalance ratio

Partially minimized with fitness sharing



What about UCS?

ir=256:1
ir=512:1


Are LCSs more error-prone to class imbalance
than other classifier schemes?
C4.5 SMO XCS
TP rate Bal2c1 0,00% ± 0,00% 0,00% ± 0,00% 0,00% ± 0,00%

Bal2c2 81,65% ± 6,83% 93,72% ± 4,64% 81,96% ± 6,00%

Bal2c3 81,90% ± 6,04% 93,77% ± 5,59% 83,99% ± 6,88%

bpa 42,95% ± 14,09% 0,00% ± 0,00% 61,38% ± 9,10%

gls2c1 80,00% ± 42,16% 0,00% ± 0,00% 50,00% ± 52,70%

gls2c2 35,00% ± 47,43% 15,00% ± 33,75% 55,00% ± 49,72%

gls2c3 30,00% ± 42,16% 0,00% ± 0,00% 5,00% ± 15,81%

gls2c4 75,00% ± 32,63% 81,67% ± 25,40% 81,67% ± 25,40%

gls2c5 77,14% ± 16,77% 10,00% ± 9,64% 84,29% ± 14,21%

gls2c6 59,82% ± 15,13% 0,00% ± 0,00% 81,79% ± 13,95%

h-s 75,83% ± 13,29% 80,00% ± 7,03% 80,00% ± 9,78%

pim 55,37% ± 13,27% 53,38% ± 6,42% 55,93% ± 9,75%

tao 95,23% ± 2,14% 84,11% ± 6,17% 92,58% ± 5,72%

thy2c1 90,00% ± 16,10% 76,67% ± 22,50% 90,00% ± 16,10%

thy2c2 94,17% ± 12,45% 54,17% ± 24,92% 90,83% ± 14,93%

thy2c3 90,95% ± 10,34% 33,81% ± 21,35% 90,71% ± 8,05%

wav2c1 75,74% ± 4,06% 88,51% ± 3,20% 87,24% ± 3,43%

wav2c2 72,34% ± 3,89% 84,57% ± 4,05% 78,72% ± 2,57%

wab2c3 77,64% ± 2,38% 89,97% ± 3,48% 87,86% ± 3,65%

wbdc 92,95% ± 3,42% 95,42% ± 5,36% 95,83% ± 5,89%

wdbc 92,47% ± 5,09% 94,81% ± 2,71% 93,83% ± 6,37%

wine2c1 89,00% ± 16,63% 100,00% ± 0,00% 100,00% ± 0,00%

wine2c2 95,00% ± 8,05% 98,33% ± 5,27% 98,33% ± 5,27%

wine2c3 90,18% ± 11,70% 97,14% ± 6,02% 98,57% ± 4,52%
wpbc 41,00% ± 12,87% 9,50% ± 17,07% 30,50% ± 24,99%

How can we Minimize the Effects of
Small Disjuncts?

Resampling the dataset:
 Addresses small
disjuncts
Classical methods:

• Random oversampling
• Random undersampling Assumes that
clusterization will
Heuristic methods:

find small
• Tomek links
disjuncts and
• CNN match classifier’s
• One-sided selection approximation
• Smote
Cluster-based oversampling
 Could XCS
benefit from the
online
Cost-sensitive classifiers

identification of
small disjuncts?

Domains of Applicability

Should we use some counterbalancing scheme?


Which learning scheme should we use?


Is there a combination of counterbalancing

scheme+learner that beats all others?

How can we know the presence of small

disjuncts?

Are there other complexity factors mixed up with

the small disjuncts problem?

Domains of Applicability

Resampling/
Learn it! Classifier/
Resampling+classifier
Where are
LCSs
placed?

Dataset Dataset
Suggested
Prediction
characterization
approach

Type of dataset:
Geometrical distribution of classes
Possible presence of small disjuncts
Other complexity factors


Future Directions

Potential benefit of XCS to discover small disjuncts

…and learn from it online

Further analyze UCS


How do LCSs perform w.r.t. other classifiers for unbalanced

datasets?

Measures for small disjuncts identification

… and other possible complexity factors

What is noise and what is a small disjunct?


In which cases a LCS is applicable?



Learning Classifier Systems for Class Imbalance Problems

Recommandé

Recommandé

Contenu connexe

Similaire à Learning Classifier Systems for Class Imbalance Problems

Similaire à Learning Classifier Systems for Class Imbalance Problems (10)

Plus de Xavier Llorà

Plus de Xavier Llorà (20)

Dernier

Dernier (20)

Learning Classifier Systems for Class Imbalance Problems