SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Learning Classifier Systems
   for Class Imbalance
         Problems

           Ester Bernadó-Mansilla

    Research Group in Intelligent Systems
      Enginyeria i Arquitectura La Salle
          Universitat Ramon Llull
             Barcelona, Spain
Aim




                         Enhance the applicability of LCSs
                       to knowledge discovery from datasets

                                        Classification problems
                                         Real-world domains




Learning Classifier Systems for Class Imbalance Problems          Ester Bernadó-Mansilla
Framework




                                                                  model
                                                      LCS
              Dataset
                                                                    +
                                                               estimated
                                                              performance
       • Representativity of the target
                                                            • Evolutionary pressures
       concept

                                                            • Interpretability
       • Geometrical complexity

                                                            • Domain of applicability
       • Class imbalance

       • Noise
Learning Classifier Systems for Class Imbalance Problems                  Ester Bernadó-Mansilla
Class Imbalance


     When one class is represented by a small number of

     examples, compared to other class/es.

     Usually the class of that describes the circumscribed

     concept (positive class) is the minority class

     Where?

            Rare medical diagnoses
       
            Fraud detection
       
            Oil spills in satellite images
       




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Class Imbalance and Classifiers


     Is there a bias towards the majority class?




     Probably, because…

            Most classifier schemes are trained to minimize the global error
       



     As a result

            They classify accurately the examples from the majority class
       
            They tend to misclassify the examples of the minority class,
       
            which are often those representing the target concept.




Learning Classifier Systems for Class Imbalance Problems        Ester Bernadó-Mansilla
Measures of Performance


                                                       Confusion matrix

                                                               Prediction
                                                           A                   B
          Actual            A              true positive (TP)         false negative (FN)
                            B              false positive (FP)         true negative (TN)




                                Accuracy = (TP+TN)/(TP+FN+FP+TN)

                                TN rate = TN / (TN + FP)

                                TP rate = TP / (FN + TP)

                                ROC curves

Learning Classifier Systems for Class Imbalance Problems                             Ester Bernadó-Mansilla
The Higher Class Imbalance: the
                    Higher Bias?

                          Dataset 1                          Dataset 2

                       concept:      15                    concept:     15
                       counterpart: 150                    counterpart: 45
                       ratio:       10:1                   ratio:       3:1




Learning Classifier Systems for Class Imbalance Problems                 Ester Bernadó-Mansilla
XCS

                                                                           XCS

                                                                                     class
                      input                                Set of
                                                           Rules

                                                                         update
                                      search

                                          Genetic                   Reinforcement
                                         Algorithms                   Learning


                                                                            reward

                                                           Environment




                                                             Dataset




Learning Classifier Systems for Class Imbalance Problems                                     Ester Bernadó-Mansilla
Our Approach with XCS


     Bounding XCS’s parameters for unbalanced datasets




     Online identification of small disjuncts




     Adaptation of parameters for the discovery of small

     disjuncts




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
XCS’s Behavior in Unbalanced
                     Datasets

                       Unbalanced 11-multiplexer problem




                                                    ir=16:1   ir=32:1   ir=64:1


Learning Classifier Systems for Class Imbalance Problems                          Ester Bernadó-Mansilla
XCS’s Population



                                        Most numerous rules, ir=128:1


            Classifier                    P                Error         F          Num
         ###########:0                  1000                0.12       0.98         385
                                       1.2 10-4
         ###########:1                                     0.074       0.98         366




                                        estimated          estimated                   too high
                                                                            high
                                        prediction:
          overgeneral                                        error:                   numerosity
                                                                          fitness
                                          992.24
          classifiers                                        15.38
                                           7.75

          Test examples are classified as belonging to the majority class



Learning Classifier Systems for Class Imbalance Problems                                  Ester Bernadó-Mansilla
How Imbalance Affects XCS


         Classifier’s error
   



         Stability of prediction and error estimates
   



         Occurrence-based reproduction
   




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Classifier’s Error in Unbalanced
                     Datasets

        Will an overgeneral classifier be detected as inaccurate if the
  
        imbalance ratio is high?

         Bound for inaccurate classifier:                       !quot;!0
          Given the estimated prediction and error:

                                          P = Pc (cl ) Rmax + (1 ! Pc (cl )) Rmin
                                          quot;=| P ! Rmax | Pc (cl )+ | P ! Rmin | (1 ! Pc (cl ))
          We derive:

                     # quot;o p 2 + 2 p ( Rmax # quot;0 )# quot;0 ! 0
          where                                                                                  !quot;!0
                               p =!C / C
          For
                        Rmax = 1000 !0 = 1
          we get maximum imbalance ratio:

                        irmax = 1998
Learning Classifier Systems for Class Imbalance Problems                                            Ester Bernadó-Mansilla
Prediction and Error Estimates and
                     Learning Rate
                 ir=128:1, ###########:0
                                                            Error
                                       Prediction
       β=0.2
       β=0.002




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Occurrence-based Reproduction


        Probability of occurrence (pocc)

        Given ir=maj/min:
                                                                                          0,6

           Classifier            poccB            poccI
                                                                                          0,5
                                  1/2             1/2
       ########### :0




                                                              probability of occurrence
                                  1/2             1/2
       ########### :1                                                                     0,4



                                                                                          0,3
       0000#######:0             1/32
                                                                                          0,2


       0001#######:1             1/32                                                     0,1



                                                                                           0
                                                                                                1    2    4   8    16   32   64   128   256

                                                                                                             imbalance ratio
                                               22ir
                                                     p occB                                     00001######:1             00000######:0
                                              ir + 1                                            ###########:0             ###########:1




Learning Classifier Systems for Class Imbalance Problems                                                                     Ester Bernadó-Mansilla
Occurrence-based Reproduction


      Probability of reproduction (pGA)
 

                                1
                      pGA =
                               TGA
                                                  if Tocc < % GA
                                  #% GA
                      where TGA $ quot;
                                  !Tocc           otherwise

      With θGA=20:
 
                                                                            GA
                                                                   Tocc
                                                                                 …
        T (# # # # # # # # # # #: 0) ! quot;
         GA                              GA
                                                                     θGA
                                                                                     Tocc
                                                                                                           GA
                                                                                       …
       T (0000# # # # # # #: 0) ! T 1
        GA                         occ
                                                                      θGA
1 Assuming   non-overlapping


 Learning Classifier Systems for Class Imbalance Problems                                   Ester Bernadó-Mansilla
Guidelines for Parameter Tuning


     Rmax and є0 determine the threshold between negligible noise and

     imbalance ratio

     β determines the size of the moving window. The window should be

     high enough to allow computing examples from both classes:
                                                                 f min
                                                       ! =k
                                                                 f maj
     θGA can counterbalance the reproduction opportunities of most frequent

     (majority) and least frequent niches (minority):

                                                                  1
                                                    ! GA = k '
                                                                 f min


Learning Classifier Systems for Class Imbalance Problems                 Ester Bernadó-Mansilla
XCS with Parameters Tuning


                                                                     XCS with parameter tuning
              XCS with standard settings




ir=16:1           ir=32:1               ir=64:1                ir=64:1        ir=256:1



    Learning Classifier Systems for Class Imbalance Problems                       Ester Bernadó-Mansilla
XCS Tuning for Real-world Datasets


     How we can estimate the niche frequency?



            Estimate from the ratio of majority class instances and minority
       
            class instances

            Problem:
       


               • This may not be related to the distribution of niches in the feature
                 space


            Take the approach to the small disjuncts problem
       




Learning Classifier Systems for Class Imbalance Problems                 Ester Bernadó-Mansilla
Online Identification of Small Disjuncts



     We search for regions that promote

     overgeneral classifiers

     Estimate ircl based on the classifier’s

     experience on each class:

                                        exp max
                               ircl =
                                        exp min


     Adapt β and θGA according to ircl



                                                           ircl = 20 / 4




Learning Classifier Systems for Class Imbalance Problems                   Ester Bernadó-Mansilla
Online Parameter Adaptation


                                                           ir=256:1




Learning Classifier Systems for Class Imbalance Problems       Ester Bernadó-Mansilla
What about UCS?


             Supervised XCS:
       
                   Needs less exploration
               


             Avoids XCS’s fitness dilemma
       


             More robust to parameter settings
       


             Overgeneral classifiers also tend to overcome the
       
             population
                   Their probability of occurrence depends on the imbalance ratio
               
                   Partially minimized with fitness sharing
               




Learning Classifier Systems for Class Imbalance Problems               Ester Bernadó-Mansilla
What about UCS?


                                                           ir=256:1
                                     ir=512:1




Learning Classifier Systems for Class Imbalance Problems       Ester Bernadó-Mansilla
Are LCSs more error-prone to class imbalance
                        than other classifier schemes?
                                             C4.5                            SMO                      XCS
TP rate          Bal2c1              0,00%   ±             0,00%    0,00%     ±    0,00%     0,00%     ±         0,00%

                 Bal2c2             81,65%   ±             6,83%    93,72%    ±    4,64%     81,96%    ±         6,00%

                 Bal2c3             81,90%   ±             6,04%    93,77%    ±    5,59%     83,99%    ±         6,88%

                 bpa                42,95%   ±         14,09%       0,00%     ±    0,00%     61,38%    ±         9,10%

                 gls2c1             80,00%   ±         42,16%       0,00%     ±    0,00%     50,00%    ±        52,70%

                 gls2c2             35,00%   ±         47,43%       15,00%    ±    33,75%    55,00%    ±        49,72%

                 gls2c3             30,00%   ±         42,16%       0,00%     ±    0,00%     5,00%     ±        15,81%

                 gls2c4             75,00%   ±         32,63%       81,67%    ±    25,40%    81,67%    ±        25,40%

                 gls2c5             77,14%   ±         16,77%       10,00%    ±    9,64%     84,29%    ±        14,21%

                 gls2c6             59,82%   ±         15,13%       0,00%     ±    0,00%     81,79%    ±        13,95%

                 h-s                75,83%   ±         13,29%       80,00%    ±    7,03%     80,00%    ±         9,78%

                 pim                55,37%   ±         13,27%       53,38%    ±    6,42%     55,93%    ±         9,75%

                 tao                95,23%   ±             2,14%    84,11%    ±    6,17%     92,58%    ±         5,72%

                 thy2c1             90,00%   ±         16,10%       76,67%    ±    22,50%    90,00%    ±        16,10%

                 thy2c2             94,17%   ±         12,45%       54,17%    ±    24,92%    90,83%    ±        14,93%

                 thy2c3             90,95%   ±         10,34%       33,81%    ±    21,35%    90,71%    ±         8,05%

                 wav2c1             75,74%   ±             4,06%    88,51%    ±    3,20%     87,24%    ±         3,43%

                 wav2c2             72,34%   ±             3,89%    84,57%    ±    4,05%     78,72%    ±         2,57%

                 wab2c3             77,64%   ±             2,38%    89,97%    ±    3,48%     87,86%    ±         3,65%

                 wbdc               92,95%   ±             3,42%    95,42%    ±    5,36%     95,83%    ±         5,89%

                 wdbc               92,47%   ±             5,09%    94,81%    ±    2,71%     93,83%    ±         6,37%

                 wine2c1            89,00%   ±         16,63%      100,00%    ±    0,00%    100,00%    ±         0,00%

                 wine2c2            95,00%   ±             8,05%    98,33%    ±    5,27%     98,33%    ±         5,27%

                 wine2c3            90,18%   ±         11,70%       97,14%    ±    6,02%     98,57%    ±         4,52%
Learning Classifier Systems for Class Imbalance Problems                                               Ester Bernadó-Mansilla
                 wpbc               41,00%   ±         12,87%       9,50%     ±    17,07%    30,50%    ±        24,99%
How can we Minimize the Effects of
                     Small Disjuncts?


     Resampling the dataset:
                                                          Addresses small
                                                           disjuncts
            Classical methods:
       
               • Random oversampling
               • Random undersampling                          Assumes that
                                                               clusterization will
            Heuristic methods:
       
                                                               find small
               •   Tomek links
                                                               disjuncts and
               •   CNN                                         match classifier’s
               •   One-sided selection                         approximation
               •   Smote
            Cluster-based oversampling
                                                                 Could XCS
                                                                  benefit from the
                                                                  online
     Cost-sensitive classifiers

                                                                  identification of
                                                                  small disjuncts?
Learning Classifier Systems for Class Imbalance Problems               Ester Bernadó-Mansilla
Domains of Applicability


                   Should we use some counterbalancing scheme?
             


                   Which learning scheme should we use?
             


                   Is there a combination of counterbalancing
             
                   scheme+learner that beats all others?

                   How can we know the presence of small
             
                   disjuncts?

                   Are there other complexity factors mixed up with
             
                   the small disjuncts problem?
Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Domains of Applicability


                                                           Resampling/
                                              Learn it!    Classifier/
                                                           Resampling+classifier
                                                                                     Where are
                                                                                       LCSs
                                                                                      placed?

          Dataset                  Dataset
                                                                             Suggested
                                                            Prediction
                               characterization
                                                                             approach


                                                Type of dataset:
                                       Geometrical distribution of classes
                                      Possible presence of small disjuncts
                                           Other complexity factors


Learning Classifier Systems for Class Imbalance Problems                       Ester Bernadó-Mansilla
Future Directions


                   Potential benefit of XCS to discover small disjuncts
             
                     …and learn from it online

                   Further analyze UCS
             


                   How do LCSs perform w.r.t. other classifiers for unbalanced
             
                   datasets?

                   Measures for small disjuncts identification
             
                     … and other possible complexity factors

                   What is noise and what is a small disjunct?
             


                   In which cases a LCS is applicable?
             


Learning Classifier Systems for Class Imbalance Problems            Ester Bernadó-Mansilla
Learning Classifier Systems
   for Class Imbalance
         Problems

           Ester Bernadó-Mansilla

    Research Group in Intelligent Systems
      Enginyeria i Arquitectura La Salle
          Universitat Ramon Llull
             Barcelona, Spain

Contenu connexe

Similaire à Learning Classifier Systems for Class Imbalance Problems

Structural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOAStructural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOAclima
 
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsGECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsAlbert Orriols-Puig
 
Statistical classification: A review on some techniques
Statistical classification: A review on some techniquesStatistical classification: A review on some techniques
Statistical classification: A review on some techniquesGiorgos Bamparopoulos
 
Whitcher Ismrm 2009
Whitcher Ismrm 2009Whitcher Ismrm 2009
Whitcher Ismrm 2009bwhitcher
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentationKaiwen Qi
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsBrian Lange
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7Roger Barga
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...Chakkrit (Kla) Tantithamthavorn
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 

Similaire à Learning Classifier Systems for Class Imbalance Problems (10)

Structural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOAStructural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOA
 
5 5 10
5 5 105 5 10
5 5 10
 
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsGECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
 
Statistical classification: A review on some techniques
Statistical classification: A review on some techniquesStatistical classification: A review on some techniques
Statistical classification: A review on some techniques
 
Whitcher Ismrm 2009
Whitcher Ismrm 2009Whitcher Ismrm 2009
Whitcher Ismrm 2009
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentation
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithms
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 

Plus de Xavier Llorà

Meandre 2.0 Alpha Preview
Meandre 2.0 Alpha PreviewMeandre 2.0 Alpha Preview
Meandre 2.0 Alpha PreviewXavier Llorà
 
Soaring the Clouds with Meandre
Soaring the Clouds with MeandreSoaring the Clouds with Meandre
Soaring the Clouds with MeandreXavier Llorà
 
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0Xavier Llorà
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningXavier Llorà
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
 
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsScalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsXavier Llorà
 
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...Xavier Llorà
 
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...Xavier Llorà
 
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...Xavier Llorà
 
XCS: Current capabilities and future challenges
XCS: Current capabilities and future  challengesXCS: Current capabilities and future  challenges
XCS: Current capabilities and future challengesXavier Llorà
 
Negative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly DetectionNegative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly DetectionXavier Llorà
 
Searle, Intentionality, and the Future of Classifier Systems
Searle, Intentionality, and the  Future of Classifier SystemsSearle, Intentionality, and the  Future of Classifier Systems
Searle, Intentionality, and the Future of Classifier SystemsXavier Llorà
 
Computed Prediction: So far, so good. What now?
Computed Prediction:  So far, so good. What now?Computed Prediction:  So far, so good. What now?
Computed Prediction: So far, so good. What now?Xavier Llorà
 
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems TractableLinkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems TractableXavier Llorà
 
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsMeandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsXavier Llorà
 
ZigZag: The Meandring Language
ZigZag: The Meandring LanguageZigZag: The Meandring Language
ZigZag: The Meandring LanguageXavier Llorà
 
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...Xavier Llorà
 
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Xavier Llorà
 
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...Xavier Llorà
 

Plus de Xavier Llorà (20)

Meandre 2.0 Alpha Preview
Meandre 2.0 Alpha PreviewMeandre 2.0 Alpha Preview
Meandre 2.0 Alpha Preview
 
Soaring the Clouds with Meandre
Soaring the Clouds with MeandreSoaring the Clouds with Meandre
Soaring the Clouds with Meandre
 
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsScalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
 
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
 
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
 
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
 
XCS: Current capabilities and future challenges
XCS: Current capabilities and future  challengesXCS: Current capabilities and future  challenges
XCS: Current capabilities and future challenges
 
Negative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly DetectionNegative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly Detection
 
Searle, Intentionality, and the Future of Classifier Systems
Searle, Intentionality, and the  Future of Classifier SystemsSearle, Intentionality, and the  Future of Classifier Systems
Searle, Intentionality, and the Future of Classifier Systems
 
Computed Prediction: So far, so good. What now?
Computed Prediction:  So far, so good. What now?Computed Prediction:  So far, so good. What now?
Computed Prediction: So far, so good. What now?
 
NIGEL 2006 welcome
NIGEL 2006 welcomeNIGEL 2006 welcome
NIGEL 2006 welcome
 
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems TractableLinkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
 
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsMeandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
 
ZigZag: The Meandring Language
ZigZag: The Meandring LanguageZigZag: The Meandring Language
ZigZag: The Meandring Language
 
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
 
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
 
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
 

Dernier

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 

Dernier (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 

Learning Classifier Systems for Class Imbalance Problems

  • 1. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain
  • 2. Aim Enhance the applicability of LCSs to knowledge discovery from datasets Classification problems Real-world domains Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 3. Framework model LCS Dataset + estimated performance • Representativity of the target • Evolutionary pressures concept • Interpretability • Geometrical complexity • Domain of applicability • Class imbalance • Noise Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 4. Class Imbalance When one class is represented by a small number of  examples, compared to other class/es. Usually the class of that describes the circumscribed  concept (positive class) is the minority class Where?  Rare medical diagnoses  Fraud detection  Oil spills in satellite images  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 5. Class Imbalance and Classifiers Is there a bias towards the majority class?  Probably, because…  Most classifier schemes are trained to minimize the global error  As a result  They classify accurately the examples from the majority class  They tend to misclassify the examples of the minority class,  which are often those representing the target concept. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 6. Measures of Performance Confusion matrix Prediction A B Actual A true positive (TP) false negative (FN) B false positive (FP) true negative (TN) Accuracy = (TP+TN)/(TP+FN+FP+TN) TN rate = TN / (TN + FP) TP rate = TP / (FN + TP) ROC curves Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 7. The Higher Class Imbalance: the Higher Bias? Dataset 1 Dataset 2 concept: 15 concept: 15 counterpart: 150 counterpart: 45 ratio: 10:1 ratio: 3:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 8. XCS XCS class input Set of Rules update search Genetic Reinforcement Algorithms Learning reward Environment Dataset Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 9. Our Approach with XCS Bounding XCS’s parameters for unbalanced datasets  Online identification of small disjuncts  Adaptation of parameters for the discovery of small  disjuncts Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 10. XCS’s Behavior in Unbalanced Datasets Unbalanced 11-multiplexer problem ir=16:1 ir=32:1 ir=64:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 11. XCS’s Population Most numerous rules, ir=128:1 Classifier P Error F Num ###########:0 1000 0.12 0.98 385 1.2 10-4 ###########:1 0.074 0.98 366 estimated estimated too high high prediction: overgeneral error: numerosity fitness 992.24 classifiers 15.38 7.75 Test examples are classified as belonging to the majority class Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 12. How Imbalance Affects XCS Classifier’s error  Stability of prediction and error estimates  Occurrence-based reproduction  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 13. Classifier’s Error in Unbalanced Datasets Will an overgeneral classifier be detected as inaccurate if the  imbalance ratio is high? Bound for inaccurate classifier: !quot;!0 Given the estimated prediction and error: P = Pc (cl ) Rmax + (1 ! Pc (cl )) Rmin quot;=| P ! Rmax | Pc (cl )+ | P ! Rmin | (1 ! Pc (cl )) We derive: # quot;o p 2 + 2 p ( Rmax # quot;0 )# quot;0 ! 0 where !quot;!0 p =!C / C For Rmax = 1000 !0 = 1 we get maximum imbalance ratio: irmax = 1998 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 14. Prediction and Error Estimates and Learning Rate ir=128:1, ###########:0 Error Prediction β=0.2 β=0.002 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 15. Occurrence-based Reproduction Probability of occurrence (pocc) Given ir=maj/min: 0,6 Classifier poccB poccI 0,5 1/2 1/2 ########### :0 probability of occurrence 1/2 1/2 ########### :1 0,4 0,3 0000#######:0 1/32 0,2 0001#######:1 1/32 0,1 0 1 2 4 8 16 32 64 128 256 imbalance ratio 22ir p occB 00001######:1 00000######:0 ir + 1 ###########:0 ###########:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 16. Occurrence-based Reproduction Probability of reproduction (pGA)  1 pGA = TGA if Tocc < % GA #% GA where TGA $ quot; !Tocc otherwise With θGA=20:  GA Tocc … T (# # # # # # # # # # #: 0) ! quot; GA GA θGA Tocc GA … T (0000# # # # # # #: 0) ! T 1 GA occ θGA 1 Assuming non-overlapping Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 17. Guidelines for Parameter Tuning Rmax and є0 determine the threshold between negligible noise and  imbalance ratio β determines the size of the moving window. The window should be  high enough to allow computing examples from both classes: f min ! =k f maj θGA can counterbalance the reproduction opportunities of most frequent  (majority) and least frequent niches (minority): 1 ! GA = k ' f min Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 18. XCS with Parameters Tuning XCS with parameter tuning XCS with standard settings ir=16:1 ir=32:1 ir=64:1 ir=64:1 ir=256:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 19. XCS Tuning for Real-world Datasets How we can estimate the niche frequency?  Estimate from the ratio of majority class instances and minority  class instances Problem:  • This may not be related to the distribution of niches in the feature space Take the approach to the small disjuncts problem  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 20. Online Identification of Small Disjuncts We search for regions that promote  overgeneral classifiers Estimate ircl based on the classifier’s  experience on each class: exp max ircl = exp min Adapt β and θGA according to ircl  ircl = 20 / 4 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 21. Online Parameter Adaptation ir=256:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 22. What about UCS? Supervised XCS:  Needs less exploration  Avoids XCS’s fitness dilemma  More robust to parameter settings  Overgeneral classifiers also tend to overcome the  population Their probability of occurrence depends on the imbalance ratio  Partially minimized with fitness sharing  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 23. What about UCS? ir=256:1 ir=512:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 24. Are LCSs more error-prone to class imbalance than other classifier schemes? C4.5 SMO XCS TP rate Bal2c1 0,00% ± 0,00% 0,00% ± 0,00% 0,00% ± 0,00% Bal2c2 81,65% ± 6,83% 93,72% ± 4,64% 81,96% ± 6,00% Bal2c3 81,90% ± 6,04% 93,77% ± 5,59% 83,99% ± 6,88% bpa 42,95% ± 14,09% 0,00% ± 0,00% 61,38% ± 9,10% gls2c1 80,00% ± 42,16% 0,00% ± 0,00% 50,00% ± 52,70% gls2c2 35,00% ± 47,43% 15,00% ± 33,75% 55,00% ± 49,72% gls2c3 30,00% ± 42,16% 0,00% ± 0,00% 5,00% ± 15,81% gls2c4 75,00% ± 32,63% 81,67% ± 25,40% 81,67% ± 25,40% gls2c5 77,14% ± 16,77% 10,00% ± 9,64% 84,29% ± 14,21% gls2c6 59,82% ± 15,13% 0,00% ± 0,00% 81,79% ± 13,95% h-s 75,83% ± 13,29% 80,00% ± 7,03% 80,00% ± 9,78% pim 55,37% ± 13,27% 53,38% ± 6,42% 55,93% ± 9,75% tao 95,23% ± 2,14% 84,11% ± 6,17% 92,58% ± 5,72% thy2c1 90,00% ± 16,10% 76,67% ± 22,50% 90,00% ± 16,10% thy2c2 94,17% ± 12,45% 54,17% ± 24,92% 90,83% ± 14,93% thy2c3 90,95% ± 10,34% 33,81% ± 21,35% 90,71% ± 8,05% wav2c1 75,74% ± 4,06% 88,51% ± 3,20% 87,24% ± 3,43% wav2c2 72,34% ± 3,89% 84,57% ± 4,05% 78,72% ± 2,57% wab2c3 77,64% ± 2,38% 89,97% ± 3,48% 87,86% ± 3,65% wbdc 92,95% ± 3,42% 95,42% ± 5,36% 95,83% ± 5,89% wdbc 92,47% ± 5,09% 94,81% ± 2,71% 93,83% ± 6,37% wine2c1 89,00% ± 16,63% 100,00% ± 0,00% 100,00% ± 0,00% wine2c2 95,00% ± 8,05% 98,33% ± 5,27% 98,33% ± 5,27% wine2c3 90,18% ± 11,70% 97,14% ± 6,02% 98,57% ± 4,52% Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla wpbc 41,00% ± 12,87% 9,50% ± 17,07% 30,50% ± 24,99%
  • 25. How can we Minimize the Effects of Small Disjuncts? Resampling the dataset:  Addresses small disjuncts Classical methods:  • Random oversampling • Random undersampling Assumes that clusterization will Heuristic methods:  find small • Tomek links disjuncts and • CNN match classifier’s • One-sided selection approximation • Smote Cluster-based oversampling  Could XCS benefit from the online Cost-sensitive classifiers  identification of small disjuncts? Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 26. Domains of Applicability Should we use some counterbalancing scheme?  Which learning scheme should we use?  Is there a combination of counterbalancing  scheme+learner that beats all others? How can we know the presence of small  disjuncts? Are there other complexity factors mixed up with  the small disjuncts problem? Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 27. Domains of Applicability Resampling/ Learn it! Classifier/ Resampling+classifier Where are LCSs placed? Dataset Dataset Suggested Prediction characterization approach Type of dataset: Geometrical distribution of classes Possible presence of small disjuncts Other complexity factors Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 28. Future Directions Potential benefit of XCS to discover small disjuncts  …and learn from it online Further analyze UCS  How do LCSs perform w.r.t. other classifiers for unbalanced  datasets? Measures for small disjuncts identification  … and other possible complexity factors What is noise and what is a small disjunct?  In which cases a LCS is applicable?  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 29. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain