SlideShare une entreprise Scribd logo
1  sur  61
Classification Rules
Machine Learning and Data Mining (Unit 12)




Prof. Pier Luca Lanzi
References                                    2



  Jiawei Han and Micheline Kamber, "Data Mining, : Concepts
  and Techniques", The Morgan Kaufmann Series in Data
  Management Systems (Second Edition).
  Tom M. Mitchell. “Machine Learning” McGraw Hill 1997
  Pang-Ning Tan, Michael Steinbach, Vipin Kumar,
  “Introduction to Data Mining”, Addison Wesley.
  Ian H. Witten, Eibe Frank. “Data Mining: Practical Machine
  Learning Tools and Techniques with Java Implementations”
  2nd Edition.




                  Prof. Pier Luca Lanzi
Lecture outline                               3



  Why rules? What are classification rules?
  Which type of rules?
  What methods?
  Direct Methods
     The OneRule algorithm
     Sequential Covering Algorithms
     The RISE algorithm
  Indirect Methods




                   Prof. Pier Luca Lanzi
Example: to play or not to play?                     4




Outlook      Temp                 Humidity   Windy       Play
Sunny        Hot                  High       False       No
Sunny        Hot                  High       True        No
Overcast     Hot                  High       False       Yes
Rainy        Mild                 High       False       Yes
Rainy        Cool                 Normal     False       Yes
Rainy        Cool                 Normal     True        No
Overcast     Cool                 Normal     True        Yes
Sunny        Mild                 High       False       No
Sunny        Cool                 Normal     False       Yes
Rainy        Mild                 Normal     False       Yes
Sunny        Mild                 Normal     True        Yes
Overcast     Mild                 High       True        Yes
Overcast     Hot                  Normal     False       Yes
Rainy        Mild                 High       True        No

                    Prof. Pier Luca Lanzi
A Rule Set to Classify the Data           5



  IF (humidity = high) and (outlook = sunny)
  THEN play=no (3.0/0.0)
  IF (outlook = rainy) and (windy = TRUE)
  THEN play=no (2.0/0.0)
  OTHERWISE play=yes (9.0/0.0)

  Confusion Matrix

   yes no <-- classified as
    7 2 | yes
    3 2 | no




                  Prof. Pier Luca Lanzi
Let’s check the rules…                                  6


Outlook         Temp                 Humidity   Windy        Play
Sunny           Hot                  High       False        No
Sunny           Hot                  High       True         No
Overcast        Hot                  High       False        Yes
Rainy           Mild                 High       False        Yes
Rainy           Cool                 Normal     False        Yes
Rainy           Cool                 Normal     True         No
Overcast        Cool                 Normal     True         Yes
Sunny           Mild                 High       False        No
Sunny           Cool                 Normal     False        Yes
Rainy           Mild                 Normal     False        Yes
Sunny           Mild                 Normal     True         Yes
Overcast        Mild                 High       True         Yes
Overcast        Hot                  Normal     False        Yes
Rainy           Mild                 High       True         No

  IF (humidity = high) and (outlook = sunny) THEN play=no (3.0/0.0)

                       Prof. Pier Luca Lanzi
Let’s check the rules…                                   7


Outlook          Temp                 Humidity   Windy       Play
Sunny            Hot                  High       False       No
Sunny            Hot                  High       True        No
Overcast         Hot                  High       False       Yes
Rainy            Mild                 High       False       Yes
Rainy            Cool                 Normal     False       Yes
Rainy            Cool                 Normal     True        No
Overcast         Cool                 Normal     True        Yes
Sunny            Mild                 High       False       No
Sunny            Cool                 Normal     False       Yes
Rainy            Mild                 Normal     False       Yes
Sunny            Mild                 Normal     True        Yes
Overcast         Mild                 High       True        Yes
Overcast         Hot                  Normal     False       Yes
Rainy            Mild                 High       True        No

    IF (outlook = rainy) and (windy = TRUE) THEN play=no (2.0/0.0)

                        Prof. Pier Luca Lanzi
Let’s check the rules…                               8


Outlook      Temp                 Humidity   Windy       Play
Sunny        Hot                  High       False       No
Sunny        Hot                  High       True        No
Overcast     Hot                  High       False       Yes
Rainy        Mild                 High       False       Yes
Rainy        Cool                 Normal     False       Yes
Rainy        Cool                 Normal     True        No
Overcast     Cool                 Normal     True        Yes
Sunny        Mild                 High       False       No
Sunny        Cool                 Normal     False       Yes
Rainy        Mild                 Normal     False       Yes
Sunny        Mild                 Normal     True        Yes
Overcast     Mild                 High       True        Yes
Overcast     Hot                  Normal     False       Yes
Rainy        Mild                 High       True        No

                OTHERWISE play=yes (9.0/0.0)

                    Prof. Pier Luca Lanzi
A Simpler Solution                       9



  IF (outlook = sunny) THEN play IS no
  ELSE IF (outlook = overcast) THEN play IS yes
  ELSE IF (outlook = rainy) THEN play IS yes
  (10/14 instances correct)

  Confusion Matrix

   yes no   <-- classified as
    4   5 | yes
    3   2 | no




                 Prof. Pier Luca Lanzi
What are classification rules?                 10
Why rules?

  They are IF-THEN rules
  The IF part states a condition over the data
  The THEN part includes a class label
  Which type of conditions?
     Propositional, with attribute-value comparisons
     First order Horn clauses, with variables

  Why rules?
    One of the most expressive and most human readable
    representation for hypotheses is sets of IF-THEN rules




                  Prof. Pier Luca Lanzi
IF-THEN Rules for Classification             11



  Represent the knowledge in the form of IF-THEN rules
     R: IF (humidity = high) and (outlook = sunny)
     THEN play=no (3.0/0.0)
     Rule antecedent/precondition vs. rule consequent
  Assessment of a rule: coverage and accuracy
     ncovers = # of tuples covered by R
     ncorrect = # of tuples correctly classified by R
     coverage(R) = ncovers /|training data set|
     accuracy(R) = ncorrect / ncovers




                  Prof. Pier Luca Lanzi
IF-THEN Rules for Classification                12



  If more than one rule is triggered, need conflict resolution
     Size ordering: assign the highest priority to the triggering
     rules that has the “toughest” requirement (i.e., with the
     most attribute test)
     Class-based ordering: decreasing order of prevalence or
     misclassification cost per class
     Rule-based ordering (decision list): rules are organized
     into one long priority list, according to some measure of
     rule quality or by experts




                   Prof. Pier Luca Lanzi
Characteristics of Rule-Based Classifier        13



  Mutually exclusive rules
    Classifier contains mutually exclusive rules if the rules are
    independent of each other
    Every record is covered by at most one rule

  Exhaustive rules
     Classifier has exhaustive coverage if it accounts for every
     possible combination of attribute values
     Each record is covered by at least one rule




                   Prof. Pier Luca Lanzi
What are the approaches?                        14



  Direct Methods
     Directly learn the rules from the training data
  Indirect Methods
     Learn decision tree, then convert to rules
     Learn neural networks, then extract rules




                   Prof. Pier Luca Lanzi
Inferring rudimentary rules                     15



  1R: learns a 1-level decision tree
     I.e., rules that all test one particular attribute
     Assumes nominal attributes
  Basic version
     One branch for each value
     Each branch assigns most frequent class
     Error rate: proportion of instances that don’t belong to
     the majority class of their corresponding branch
     Choose attribute with lowest error rate




                   Prof. Pier Luca Lanzi
Pseudo-code for 1R                                16




  For each attribute,
         For each value of the attribute,
         make a rule as follows:
                 count how often each class appears
                 find the most frequent class
                 make the rule assign that class to
                 this attribute-value
         Calculate the error rate of the rules

  Choose the rules with the smallest error rate


  Note: “missing” is treated as a separate attribute value




                   Prof. Pier Luca Lanzi
Evaluating the weather attributes                              17



Outlook     Temp   Humidity   Windy      Play

                                                  Attribute   Rules               Errors   Total
Sunny       Hot    High       False      No
                                                                                           errors
Sunny       Hot    High       True       No
                                                              Sunny → No
                                                  Outlook                         2/5      4/14
Overcast    Hot    High       False      Yes
                                                              Overcast → Yes      0/4
Rainy       Mild   High       False      Yes
                                                              Rainy → Yes         2/5
Rainy       Cool   Normal     False      Yes
                                                              Hot → No*
                                                  Temp                            2/4      5/14
Rainy       Cool   Normal     True       No
                                                              Mild → Yes          2/6
Overcast    Cool   Normal     True       Yes
                                                              Cool → Yes          1/4
Sunny       Mild   High       False      No
                                                              High → No
                                                  Humidity                        3/7      4/14
Sunny       Cool   Normal     False      Yes
                                                              Normal → Yes        1/7
Rainy       Mild   Normal     False      Yes
                                                              False → Yes
                                                  Windy                           2/8      5/14
Sunny       Mild   Normal     True       Yes
                                                              True → No*          3/6
Overcast    Mild   High       True       Yes
Overcast    Hot    Normal     False      Yes
                                                              * indicates a tie
Rainy       Mild   High       True       No

                              Prof. Pier Luca Lanzi
Dealing with numeric attributes                            18



   Discretize numeric attributes
   Divide each attribute’s range into intervals
   Sort instances according to attribute’s values
   Place breakpoints where class changes (majority class)
   This minimizes the total error
   Example: temperature from weather data

64    65   68       69 70     71 72 72    75 75     80   81 83         85
Yes | No | Yes      Yes Yes | No No Yes | Yes Yes | No | Yes Yes |     No



         Outlook      Temperature       Humidity   Windy        Play
          Sunny           85                85     False        No
          Sunny           80                90     True         No
         Overcast         83                86     False        Yes
          Rainy           75                80     False        Yes
            …             …                 …       …            …

                        Prof. Pier Luca Lanzi
The problem of overfitting                         19



      This procedure is very sensitive to noise
      One instance with an incorrect class label will probably
      produce a separate interval
      Also: time stamp attribute will have zero errors
      Simple solution: enforce minimum number of instances in
      majority class per interval.
      As an example, with min = 3,




64    65   68     69 70    71 72 72   75 75     80   81   83   85
Yes | No | Yes    Yes Yes | No No Yes | Yes Yes | No | Yes Yes |
No
64     65   68    69 70    71 72 72   75   75   80   81   83   85
Yes    No   Yes   Yes Yes | No No Yes   Yes Yes | No   Yes Yes    No



                       Prof. Pier Luca Lanzi
With overfitting avoidance                           20




    Attribute     Rules                    Errors   Total errors
                  Sunny → No
    Outlook                                2/5      4/14
                  Overcast → Yes           0/4
                  Rainy → Yes              2/5
    Temperature                            3/10     5/14
                  ≤ 77.5 → Yes
                  > 77.5 → No*             2/4
    Humidity                               1/7      3/14
                  ≤ 82.5 → Yes
                  > 82.5 and ≤ 95.5 → No   2/6
                  > 95.5 → Yes             0/1
                  False → Yes
    Windy                                  2/8      5/14
                  True → No*               3/6




                  Prof. Pier Luca Lanzi
Sequential covering algorithms                21



  Consider the set E of positive and negative examples
  Repeat
     Learn one rule with high accuracy, any coverage
     Remove positive examples covered by this rule
  Until all the examples are covered




                  Prof. Pier Luca Lanzi
Sequential covering algorithms                22



1. LearnedRules := {}
2. Rule := LearnOneRule(class, Attributes, Threshold)
3. while( Performance(Rule, Examples) > Threshold)
    4. LearnedRules = LearnedRules + Rule
    5. Examples = Examples – Covered(Examples, Rule)
    6. Rule = LearnOneRule(class, Attributes, Examples)
7. Sort(LearnedRules, Performance)
8. Return LearnedRules

  The core of the algorithm is LearnOneRule




                   Prof. Pier Luca Lanzi
How to learn one rule?                         23



  General to Specific
    Start with the most general hypothesis and
    then go on through specialization steps
  Specific to General
    Start with the set of the most specific hypothesis
    and then go on through generalization steps




                  Prof. Pier Luca Lanzi
Learning One Rule, General to Specific                          24




                                     IF THEN Play=yes



IF Wind=yes                                                            IF …
THEN Play=yes                                                          THEN …
P=5/10 = 0.5
                                 IF Humidity=Normal     IF Humidity=High
           IF Wind=No
                                 THEN Play=yes          THEN Play=yes
           THEN Play=yes

          P=6/8=0.75                 P=6/7=0.86           P=3/7=0.43




                                  IF Humidity=Normal     IF Humidity=Normal
          IF Humidity=Normal
                                  AND Wind=No            AND Outlook=Rainy
          AND Wind=yes
                                  THEN Play=yes          THEN Play=yes
          THEN Play=yes

                           Prof. Pier Luca Lanzi
Learning One Rule, another viewpoint                                                       25




                                                               b                                              b
                  b                                                aa                                                 a
                          a                                                                                                a
                                                             bbb
                             a                         y                                 y                b
              b
      b                                                                                           b               b
                      b
                                                                    aa a                                               a
                           a                                 bb                                       bb                   aa
          bb                 aa
                                                                                        2·6
 y                                                            bbb                                             b
                  b                                                                                   b
          b                                                                                                           b
                          b                                              ab
                                      ab                                                                                           ab
                                                                 b                                             b
                   b                                                 b                                                     b
                              b
                                                                   b   b                                              b
                                  b                                                                                            b
                          b
                                                                              x                                                         x
                                                                   1·2
                      x                                                                                            1·2




If true then class = a                                                   If x > 1.2 and y > 2.6 then class = a
                                           If x > 1.2 then class = a




                                                 Prof. Pier Luca Lanzi
Another Example of Sequential Covering        26




                                         (ii) Step 1




                Prof. Pier Luca Lanzi
Example of Sequential Covering…                  27




       R1                                  R1




                                                         R2

    (iii) Step 2                           (iv) Step 3




                   Prof. Pier Luca Lanzi
Learn-one-rule Algorithm                28



Learn-one-rule(Target, Attributes, Example, k)
  Init BH to the most general hypothesis
  Init CH to {BH}
  While CH not empty Do
      Generate Next More Specific CH in NCH
      // check all the NCH for an hypothesis that
      // improves the performance of BH
      Update BH
      Update CH with the k best NCH
  Return a rule “IF BH THEN prediction”




                Prof. Pier Luca Lanzi
Exploring the hypothesis space               29



  The algorithm to explore the hypothesis space
  is greedy and might tend to local optima
  To improve the exploration of the hypothesis space,
  we can beam search
  At each step k candidate hypotheses are considered.




                  Prof. Pier Luca Lanzi
Example: contact lens data                                               30


     Age         Spectacle prescription      Astigmatism   Tear production rate   Recommended
                                                                                      lenses
    Young              Myope                      No            Reduced                None
    Young              Myope                      No             Normal                Soft
    Young              Myope                      Yes           Reduced                None
    Young              Myope                      Yes            Normal                Hard
    Young           Hypermetrope                  No            Reduced                None
    Young           Hypermetrope                  No             Normal                Soft
    Young           Hypermetrope                  Yes           Reduced                None
    Young           Hypermetrope                  Yes            Normal                hard
Pre-presbyopic         Myope                      No            Reduced                None
Pre-presbyopic         Myope                      No             Normal                Soft
Pre-presbyopic         Myope                      Yes           Reduced                None
Pre-presbyopic         Myope                      Yes            Normal                Hard
Pre-presbyopic      Hypermetrope                  No            Reduced                None
Pre-presbyopic      Hypermetrope                  No             Normal                Soft
Pre-presbyopic      Hypermetrope                  Yes           Reduced                None
Pre-presbyopic      Hypermetrope                  Yes            Normal                None
  Presbyopic           Myope                      No            Reduced                None
  Presbyopic           Myope                      No             Normal                None
  Presbyopic           Myope                      Yes           Reduced                None
  Presbyopic           Myope                      Yes            Normal                Hard
  Presbyopic        Hypermetrope                  No            Reduced                None
  Presbyopic        Hypermetrope                  No             Normal                Soft
  Presbyopic        Hypermetrope                  Yes           Reduced                None
  Presbyopic        Hypermetrope                  Yes            Normal                None
                                Prof. Pier Luca Lanzi
Example: contact lens data                      31



  Rule we seek: If ?
                         then recommendation = hard


  Possible tests:


         Age = Young                                  2/8
         Age = Pre-presbyopic                         1/8
         Age = Presbyopic                             1/8
         Spectacle prescription = Myope               3/12
         Spectacle prescription = Hypermetrope        1/12
         Astigmatism = no                             0/12
         Astigmatism = yes                            4/12
         Tear production rate = Reduced               0/12
         Tear production rate = Normal                4/12

                    Prof. Pier Luca Lanzi
Modified rule and resulting data                             32



  Rule with best test added,
                If astigmatism = yes
                    then recommendation = hard


  Instances covered by modified rule,

Age          Spectacle             Astigmatism   Tear production   Recommended
             prescription                        rate              lenses
Young        Myope                 Yes           Reduced           None
Young        Myope                 Yes           Normal            Hard
Young        Hypermetrope          Yes           Reduced           None
Young        Hypermetrope          Yes           Normal            hard
Pre-         Myope                 Yes           Reduced           None
presbyopic
Pre-         Myope                 Yes           Normal            Hard
presbyopic
Pre-         Hypermetrope          Yes           Reduced           None
presbyopic
Pre-         Hypermetrope          Yes           Normal            None
presbyopic
Presbyopic   Myope                 Yes           Reduced           None
Presbyopic   Myope                 Yes           Normal            Hard
Presbyopic   Hypermetrope          Yes           Reduced           None
Presbyopic   Hypermetrope          Yes           Normal            None

                     Prof. Pier Luca Lanzi
Further refinement                              33



  Current state,        If astigmatism = yes
                            and ?
                          then recommendation = hard



  Possible tests,


         Age = Young                                   2/4
         Age = Pre-presbyopic                          1/4
         Age = Presbyopic                              1/4
         Spectacle prescription = Myope                3/6
         Spectacle prescription = Hypermetrope         1/6
         Tear production rate = Reduced                0/6
         Tear production rate = Normal                 4/6


                    Prof. Pier Luca Lanzi
Modified rule and resulting data                                34



      Rule with best test added:

                If astigmatism = yes
                    and tear production rate = normal
                then recommendation = Hard



      Instances covered by modified rule

Age             Spectacle            Astigmatism   Tear production    Recommended
                prescription                       rate               lenses
Young           Myope                Yes           Normal             Hard
Young           Hypermetrope         Yes           Normal             Hard
Prepresbyopic   Myope                Yes           Normal             Hard
Prepresbyopic   Hypermetrope         Yes           Normal             None
Presbyopic      Myope                Yes           Normal             Hard
Presbyopic      Hypermetrope         Yes           Normal             None




                           Prof. Pier Luca Lanzi
Further refinement                             35



  Current state:
                    If astigmatism = yes
                         and tear production rate = normal
                         and ?
                       then recommendation = hard


  Possible tests:
           Age = Young                                2/2
           Age = Pre-presbyopic                       1/2
           Age = Presbyopic                           1/2
           Spectacle prescription = Myope             3/3
           Spectacle prescription = Hypermetrope      1/3

  Tie between the first and the fourth test,
  we choose the one with greater coverage


                     Prof. Pier Luca Lanzi
The result                                       36



  Final rule:       If astigmatism = yes
                       and tear production rate = normal
                       and spectacle prescription = myope
                       then recommendation = hard


  Second rule for recommending “hard lenses”:
  (built from instances not covered by first rule)
                    If age = young and astigmatism = yes
                       and tear production rate = normal
                       then recommendation = hard


  These two rules cover all “hard lenses”:
  Process is repeated with other two classes




                   Prof. Pier Luca Lanzi
Pseudo-code for PRISM                            37




  For each class C
    Initialize E to the instance set
    While E contains instances in class C
      Create a rule R with an empty left-hand side
        that predicts class C
      Until R is perfect (or there are no more attributes
        to use) do
        For each attribute A not mentioned in R,
          and each value v,
          Consider adding the condition A = v
            to the left-hand side of R
          Select A and v to maximize the accuracy p/t
            (break ties by choosing the condition
            with the largest p)
        Add A = v to R
      Remove the instances covered by R from E




                   Prof. Pier Luca Lanzi
Test selection criteria                        38



  Basic covering algorithm:
     keep adding conditions to a rule to improve its accuracy
     Add the condition that improves accuracy the most
  Measure 1: p/t
     t total instances covered by rule
     pnumber of these that are positive
     Produce rules that don’t cover negative instances,
     as quickly as possible
     May produce rules with very small coverage
     —special cases or noise?
  Measure 2: Information gain p (log(p/t) – log(P/T))
     P and T the positive and total numbers before the new
     condition was added
     Information gain emphasizes positive rather than
     negative instances
  These interact with the pruning mechanism used
                   Prof. Pier Luca Lanzi
Instance Elimination                           39



  Why do we need to eliminate
instances?
      Otherwise, the next rule is
      identical to previous rule
  Why do we remove positive
instances?
      Ensure that the next rule is
      different
  Why do we remove negative
instances?
      Prevent underestimating
      accuracy of rule
      Compare rules R2 and R3 in the
      diagram




                       Prof. Pier Luca Lanzi
Missing values and numeric attributes         40



  Common treatment of missing values: for any test, they fail
  Algorithm must either
     Use other tests to separate out positive instances
     Leave them uncovered until later in the process
  In some cases it is better to treat “missing”
  as a separate value
  Numeric attributes are treated just like
  they are in decision trees




                  Prof. Pier Luca Lanzi
Pruning rules                                 41



  Two main strategies:
     Incremental pruning
     Global pruning
  Other difference: pruning criterion
     Error on hold-out set (reduced-error pruning)
     Statistical significance
     MDL principle
  Also: post-pruning vs. pre-pruning




                  Prof. Pier Luca Lanzi
Effect of Rule Simplification                    42




 Rules are no longer mutually exclusive
    A record may trigger more than one rule
    Solution?
     • Ordered rule set
     • Unordered rule set – use voting schemes


 Rules are no longer exhaustive
    A record may not trigger any rules
    Solution?
     • Use a default class




                    Prof. Pier Luca Lanzi
Rule Ordering Schemes                                     43



  Rule-based ordering
    Individual rules are ranked based on their quality
  Class-based ordering
    Rules that belong to the same class appear together




                     Prof. Pier Luca Lanzi
The RISE algorithm                            44



  It works in a specific-to-general approach
  Initially, it creates one rule for each
  training example
  Then it goes on through elementary generalization steps until
  the overall accuracy does not decrease




                  Prof. Pier Luca Lanzi
The RISE algorithm                           45



Input: ES is the training set

Let RS be ES
Compute Acc(RS)
Repeat
  For each rule R in RS,
    find the nearest example E not covered by R
    (E is of the same class as R)
    R’ = MostSpecificGeneralization(R,E)
    RS’ = RS with R replaced by R’
    if (Acc(RS’)≥Acc(RS)) then
      RS=RS’
      if R’ is identical to another rule in RS
         then,delete R’ from RS
Until no increase in Acc(RS) is obtained


                  Prof. Pier Luca Lanzi
Direct Method: Sequential Covering               46



1.   Start from an empty rule
2.   Grow a rule using the Learn-One-Rule function
3.   Remove training records covered by the rule
4.   Repeat Step (2) and (3) until stopping criterion is met




                    Prof. Pier Luca Lanzi
Stopping Criterion and Rule Pruning               47



  Stopping criterion
     Compute the gain
     If gain is not significant, discard the new rule

  Rule Pruning
     Similar to post-pruning of decision trees
     Reduced Error Pruning:
      • Remove one of the conjuncts in the rule
      • Compare error rate on validation set before and after
        pruning
      • If error improves, prune the conjunct




                   Prof. Pier Luca Lanzi
CN2 Algorithm                               48



  Start from an empty conjunct: {}
  Add conjuncts that minimizes the entropy measure:
   {A}, {A,B}, …
  Determine the rule consequent by taking
  majority class of instances covered by the rule




                 Prof. Pier Luca Lanzi
Direct Method: RIPPER                               49



  For 2-class problem, choose one of the classes as positive class,
  and the other as negative class
     Learn rules for positive class
     Negative class will be default class
  For multi-class problem
     Order the classes according to increasing class prevalence
     (fraction of instances that belong to a particular class)
     Learn the rule set for smallest class first, treat the rest as
     negative class
     Repeat with next smallest class as positive class




                     Prof. Pier Luca Lanzi
Direct Method: RIPPER                                 50



  Growing a rule:
     Start from empty rule
     Add conjuncts as long as they improve FOIL’s information gain
     Stop when rule no longer covers negative examples
     Prune the rule immediately using incremental reduced error
     pruning
     Measure for pruning: v = (p-n)/(p+n)
      • p: number of positive examples covered by the rule in
             the validation set
      • n: number of negative examples covered by the rule in
             the validation set
     Pruning method: delete any final sequence of conditions that
     maximizes v




                    Prof. Pier Luca Lanzi
Direct Method: RIPPER                              51



  Building a Rule Set:
     Use sequential covering algorithm
      • Finds the best rule that covers the current set of positive
        examples
      • Eliminate both positive and negative examples covered by
        the rule
     Each time a rule is added to the rule set, compute the
     new description length
      • stop adding new rules when the new description length is d
        bits longer than the smallest description length obtained so
        far




                   Prof. Pier Luca Lanzi
Direct Method: RIPPER                              52



  Optimize the rule set:
    For each rule r in the rule set R
      • Consider 2 alternative rules:
         – Replacement rule (r*): grow new rule from scratch
         – Revised rule(r’): add conjuncts to extend the rule r
      • Compare the rule set for r against the rule set for r*
          and r’
      • Choose rule set that minimizes MDL principle
     Repeat rule generation and rule optimization for the
     remaining positive examples




                   Prof. Pier Luca Lanzi
Rules vs. trees                                 53



  Corresponding decision tree:
  produces exactly the same predictions
  But rule sets can be more clear
  when decision trees suffer from replicated subtrees
  Also, in multi-class situations, covering algorithm
  concentrates on one class at a time whereas
  decision tree learner takes all classes into account




                   Prof. Pier Luca Lanzi
Rules vs. trees                                  54



  Sequential covering generates a rule by adding tests that
  maximize rule’s accuracy
  Similar to situation in decision trees: problem of selecting an
  attribute to split on
  But decision tree inducer maximizes overall purity
  Each new test reduces rule’s coverage




                                                             space of
                                                             examples

                                                             rule so far

                                                            rule after
                                                            adding new
                                                            term


                   Prof. Pier Luca Lanzi
Indirect Methods                           55




                   Prof. Pier Luca Lanzi
Indirect Method: C4.5rules                       56



  Extract rules from an unpruned decision tree
  For each rule, r: A → y,
     Consider an alternative rule r’: A’ → y where
     A’ is obtained by removing one of the conjuncts in A
     Compare the pessimistic error rate for r against all r’s
     Prune if one of the r’s has lower pessimistic error rate
     Repeat until we can no longer improve generalization
     error




                   Prof. Pier Luca Lanzi
Indirect Method: C4.5rules                     57



  Instead of ordering the rules, order subsets of rules
  (class ordering)
  Each subset is a collection of rules with the same rule
  consequent (class)
  Compute description length of each subset
      Description length = L(error) + g L(model)
      g is a parameter that takes into account the presence of
      redundant attributes in a rule set
      (default value = 0.5)




                   Prof. Pier Luca Lanzi
Example                                                                  58


    Name        Give Birth    Lay Eggs       Can Fly   Live in Water Have Legs     Class
human           yes          no            no          no          yes           mammals
python          no           yes           no          no          no            reptiles
salmon          no           yes           no          yes         no            fishes
whale           yes          no            no          yes         no            mammals
frog            no           yes           no          sometimes   yes           amphibians
komodo          no           yes           no          no          yes           reptiles
bat             yes          no            yes         no          yes           mammals
pigeon          no           yes           yes         no          yes           birds
cat             yes          no            no          no          yes           mammals
leopard shark   yes          no            no          yes         no            fishes
turtle          no           yes           no          sometimes   yes           reptiles
penguin         no           yes           no          sometimes   yes           birds
porcupine       yes          no            no          no          yes           mammals
eel             no           yes           no          yes         no            fishes
salamander      no           yes           no          sometimes   yes           amphibians
gila monster    no           yes           no          no          yes           reptiles
platypus        no           yes           no          no          yes           mammals
owl             no           yes           yes         no          yes           birds
dolphin         yes          no            no          yes         no            mammals
eagle           no           yes           yes         no          yes           birds

                             Prof. Pier Luca Lanzi
C4.5 versus C4.5rules versus RIPPER                                               59



                                                C4.5rules:
           Give
                                                (Give Birth=No, Can Fly=Yes) → Birds
           Birth?
                                                (Give Birth=No, Live in Water=Yes) → Fishes
                     No
   Yes
                                                (Give Birth=Yes) → Mammals
                                                (Give Birth=No, Can Fly=No, Live in Water=No) → Reptiles
                     Live In
Mammals                                         ( ) → Amphibians
                     Water?
                                                                   RIPPER:
             Yes                 No
                                                                   (Live in Water=Yes) → Fishes
                                                                   (Have Legs=No) → Reptiles
                          Sometimes
                                                                   (Give Birth=No, Can Fly=No, Live In Water=No)
                                             Can
  Fishes            Amphibians
                                                                               → Reptiles
                                             Fly?
                                                                   (Can Fly=Yes,Give Birth=No) → Birds
                                                                   () → Mammals
                                                      No
                                   Yes


                                 Birds                Reptiles



                                      Prof. Pier Luca Lanzi
C4.5 versus C4.5rules versus RIPPER                60




C4.5 and C4.5rules:
                                  PREDICT ED CLASS
                       Amphibians Fishes Reptiles Birds       M ammals
    ACT UAL Amphibians          2       0        0        0         0
    CLASS Fishes                0       2        0        0         1
            Reptiles            1       0        3        0         0
            Birds               1       0        0        3         0
            M ammals            0       0        1        0         6

RIPPER:
                                  PREDICT ED CLASS
                       Amphibians Fishes Reptiles Birds       M ammals
    ACT UAL Amphibians          0       0        0        0         2
    CLASS Fishes                0       3        0        0         0
            Reptiles            0       0        3        0         1
            Birds               0       0        1        2         1
            M ammals            0       2        1        0         4



                      Prof. Pier Luca Lanzi
Summary                                                  61



 Advantages of Rule-Based Classifiers
    As highly expressive as decision trees
    Easy to interpret
    Easy to generate
    Can classify new instances rapidly
    Performance comparable to decision trees

 Two approaches: direct and indirect methods

 Direct Methods, typically apply sequential covering approach
     Grow a single rule
     Remove Instances from rule
     Prune the rule (if necessary)
     Add rule to Current Rule Set
     Repeat

 Other approaches exist
    Specific to general exploration (RISE)
    Post processing of neural networks,
    association rules, decision trees, etc.



                      Prof. Pier Luca Lanzi

Contenu connexe

Tendances

Summer Report on Mathematics for Machine learning: Imperial College of London
Summer Report on Mathematics for Machine learning: Imperial College of LondonSummer Report on Mathematics for Machine learning: Imperial College of London
Summer Report on Mathematics for Machine learning: Imperial College of LondonYash Khanna
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Naive bayes
Naive bayesNaive bayes
Naive bayesumeskath
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaEdureka!
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfittingSivapriyaS12
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic ReasoningJunya Tanaka
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationPier Luca Lanzi
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningKoundinya Desiraju
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced dataSaurabhWani6
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolutionAmar Jukuntla
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 

Tendances (20)

Perceptron in ANN
Perceptron in ANNPerceptron in ANN
Perceptron in ANN
 
Summer Report on Mathematics for Machine learning: Imperial College of London
Summer Report on Mathematics for Machine learning: Imperial College of LondonSummer Report on Mathematics for Machine learning: Imperial College of London
Summer Report on Mathematics for Machine learning: Imperial College of London
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
Daa notes 1
Daa notes 1Daa notes 1
Daa notes 1
 
Machine learning & Time Series Analysis
Machine learning & Time Series AnalysisMachine learning & Time Series Analysis
Machine learning & Time Series Analysis
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfitting
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to Classification
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolution
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 

En vedette

10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle CompetitionsDataRobot
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learningjoshwills
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in PythonImry Kissos
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)Prof. Dr. Diego Kuonen
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning SystemsXavier Amatriain
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013Philip Zheng
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsDavid Pittman
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsNhatHai Phan
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentationlpaviglianiti
 

En vedette (20)

10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
 

Plus de Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesPier Luca Lanzi
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsPier Luca Lanzi
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesPier Luca Lanzi
 

Plus de Pier Luca Lanzi (20)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules
 

Dernier

Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 

Dernier (20)

Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 

Machine Learning and Data Mining: 12 Classification Rules

  • 1. Classification Rules Machine Learning and Data Mining (Unit 12) Prof. Pier Luca Lanzi
  • 2. References 2 Jiawei Han and Micheline Kamber, "Data Mining, : Concepts and Techniques", The Morgan Kaufmann Series in Data Management Systems (Second Edition). Tom M. Mitchell. “Machine Learning” McGraw Hill 1997 Pang-Ning Tan, Michael Steinbach, Vipin Kumar, “Introduction to Data Mining”, Addison Wesley. Ian H. Witten, Eibe Frank. “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations” 2nd Edition. Prof. Pier Luca Lanzi
  • 3. Lecture outline 3 Why rules? What are classification rules? Which type of rules? What methods? Direct Methods The OneRule algorithm Sequential Covering Algorithms The RISE algorithm Indirect Methods Prof. Pier Luca Lanzi
  • 4. Example: to play or not to play? 4 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Prof. Pier Luca Lanzi
  • 5. A Rule Set to Classify the Data 5 IF (humidity = high) and (outlook = sunny) THEN play=no (3.0/0.0) IF (outlook = rainy) and (windy = TRUE) THEN play=no (2.0/0.0) OTHERWISE play=yes (9.0/0.0) Confusion Matrix yes no <-- classified as 7 2 | yes 3 2 | no Prof. Pier Luca Lanzi
  • 6. Let’s check the rules… 6 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No IF (humidity = high) and (outlook = sunny) THEN play=no (3.0/0.0) Prof. Pier Luca Lanzi
  • 7. Let’s check the rules… 7 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No IF (outlook = rainy) and (windy = TRUE) THEN play=no (2.0/0.0) Prof. Pier Luca Lanzi
  • 8. Let’s check the rules… 8 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No OTHERWISE play=yes (9.0/0.0) Prof. Pier Luca Lanzi
  • 9. A Simpler Solution 9 IF (outlook = sunny) THEN play IS no ELSE IF (outlook = overcast) THEN play IS yes ELSE IF (outlook = rainy) THEN play IS yes (10/14 instances correct) Confusion Matrix yes no <-- classified as 4 5 | yes 3 2 | no Prof. Pier Luca Lanzi
  • 10. What are classification rules? 10 Why rules? They are IF-THEN rules The IF part states a condition over the data The THEN part includes a class label Which type of conditions? Propositional, with attribute-value comparisons First order Horn clauses, with variables Why rules? One of the most expressive and most human readable representation for hypotheses is sets of IF-THEN rules Prof. Pier Luca Lanzi
  • 11. IF-THEN Rules for Classification 11 Represent the knowledge in the form of IF-THEN rules R: IF (humidity = high) and (outlook = sunny) THEN play=no (3.0/0.0) Rule antecedent/precondition vs. rule consequent Assessment of a rule: coverage and accuracy ncovers = # of tuples covered by R ncorrect = # of tuples correctly classified by R coverage(R) = ncovers /|training data set| accuracy(R) = ncorrect / ncovers Prof. Pier Luca Lanzi
  • 12. IF-THEN Rules for Classification 12 If more than one rule is triggered, need conflict resolution Size ordering: assign the highest priority to the triggering rules that has the “toughest” requirement (i.e., with the most attribute test) Class-based ordering: decreasing order of prevalence or misclassification cost per class Rule-based ordering (decision list): rules are organized into one long priority list, according to some measure of rule quality or by experts Prof. Pier Luca Lanzi
  • 13. Characteristics of Rule-Based Classifier 13 Mutually exclusive rules Classifier contains mutually exclusive rules if the rules are independent of each other Every record is covered by at most one rule Exhaustive rules Classifier has exhaustive coverage if it accounts for every possible combination of attribute values Each record is covered by at least one rule Prof. Pier Luca Lanzi
  • 14. What are the approaches? 14 Direct Methods Directly learn the rules from the training data Indirect Methods Learn decision tree, then convert to rules Learn neural networks, then extract rules Prof. Pier Luca Lanzi
  • 15. Inferring rudimentary rules 15 1R: learns a 1-level decision tree I.e., rules that all test one particular attribute Assumes nominal attributes Basic version One branch for each value Each branch assigns most frequent class Error rate: proportion of instances that don’t belong to the majority class of their corresponding branch Choose attribute with lowest error rate Prof. Pier Luca Lanzi
  • 16. Pseudo-code for 1R 16 For each attribute, For each value of the attribute, make a rule as follows: count how often each class appears find the most frequent class make the rule assign that class to this attribute-value Calculate the error rate of the rules Choose the rules with the smallest error rate Note: “missing” is treated as a separate attribute value Prof. Pier Luca Lanzi
  • 17. Evaluating the weather attributes 17 Outlook Temp Humidity Windy Play Attribute Rules Errors Total Sunny Hot High False No errors Sunny Hot High True No Sunny → No Outlook 2/5 4/14 Overcast Hot High False Yes Overcast → Yes 0/4 Rainy Mild High False Yes Rainy → Yes 2/5 Rainy Cool Normal False Yes Hot → No* Temp 2/4 5/14 Rainy Cool Normal True No Mild → Yes 2/6 Overcast Cool Normal True Yes Cool → Yes 1/4 Sunny Mild High False No High → No Humidity 3/7 4/14 Sunny Cool Normal False Yes Normal → Yes 1/7 Rainy Mild Normal False Yes False → Yes Windy 2/8 5/14 Sunny Mild Normal True Yes True → No* 3/6 Overcast Mild High True Yes Overcast Hot Normal False Yes * indicates a tie Rainy Mild High True No Prof. Pier Luca Lanzi
  • 18. Dealing with numeric attributes 18 Discretize numeric attributes Divide each attribute’s range into intervals Sort instances according to attribute’s values Place breakpoints where class changes (majority class) This minimizes the total error Example: temperature from weather data 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No Outlook Temperature Humidity Windy Play Sunny 85 85 False No Sunny 80 90 True No Overcast 83 86 False Yes Rainy 75 80 False Yes … … … … … Prof. Pier Luca Lanzi
  • 19. The problem of overfitting 19 This procedure is very sensitive to noise One instance with an incorrect class label will probably produce a separate interval Also: time stamp attribute will have zero errors Simple solution: enforce minimum number of instances in majority class per interval. As an example, with min = 3, 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes Yes Yes | No No Yes Yes Yes | No Yes Yes No Prof. Pier Luca Lanzi
  • 20. With overfitting avoidance 20 Attribute Rules Errors Total errors Sunny → No Outlook 2/5 4/14 Overcast → Yes 0/4 Rainy → Yes 2/5 Temperature 3/10 5/14 ≤ 77.5 → Yes > 77.5 → No* 2/4 Humidity 1/7 3/14 ≤ 82.5 → Yes > 82.5 and ≤ 95.5 → No 2/6 > 95.5 → Yes 0/1 False → Yes Windy 2/8 5/14 True → No* 3/6 Prof. Pier Luca Lanzi
  • 21. Sequential covering algorithms 21 Consider the set E of positive and negative examples Repeat Learn one rule with high accuracy, any coverage Remove positive examples covered by this rule Until all the examples are covered Prof. Pier Luca Lanzi
  • 22. Sequential covering algorithms 22 1. LearnedRules := {} 2. Rule := LearnOneRule(class, Attributes, Threshold) 3. while( Performance(Rule, Examples) > Threshold) 4. LearnedRules = LearnedRules + Rule 5. Examples = Examples – Covered(Examples, Rule) 6. Rule = LearnOneRule(class, Attributes, Examples) 7. Sort(LearnedRules, Performance) 8. Return LearnedRules The core of the algorithm is LearnOneRule Prof. Pier Luca Lanzi
  • 23. How to learn one rule? 23 General to Specific Start with the most general hypothesis and then go on through specialization steps Specific to General Start with the set of the most specific hypothesis and then go on through generalization steps Prof. Pier Luca Lanzi
  • 24. Learning One Rule, General to Specific 24 IF THEN Play=yes IF Wind=yes IF … THEN Play=yes THEN … P=5/10 = 0.5 IF Humidity=Normal IF Humidity=High IF Wind=No THEN Play=yes THEN Play=yes THEN Play=yes P=6/8=0.75 P=6/7=0.86 P=3/7=0.43 IF Humidity=Normal IF Humidity=Normal IF Humidity=Normal AND Wind=No AND Outlook=Rainy AND Wind=yes THEN Play=yes THEN Play=yes THEN Play=yes Prof. Pier Luca Lanzi
  • 25. Learning One Rule, another viewpoint 25 b b b aa a a a bbb a y y b b b b b b aa a a a bb bb aa bb aa 2·6 y bbb b b b b b b ab ab ab b b b b b b b b b b b b x x 1·2 x 1·2 If true then class = a If x > 1.2 and y > 2.6 then class = a If x > 1.2 then class = a Prof. Pier Luca Lanzi
  • 26. Another Example of Sequential Covering 26 (ii) Step 1 Prof. Pier Luca Lanzi
  • 27. Example of Sequential Covering… 27 R1 R1 R2 (iii) Step 2 (iv) Step 3 Prof. Pier Luca Lanzi
  • 28. Learn-one-rule Algorithm 28 Learn-one-rule(Target, Attributes, Example, k) Init BH to the most general hypothesis Init CH to {BH} While CH not empty Do Generate Next More Specific CH in NCH // check all the NCH for an hypothesis that // improves the performance of BH Update BH Update CH with the k best NCH Return a rule “IF BH THEN prediction” Prof. Pier Luca Lanzi
  • 29. Exploring the hypothesis space 29 The algorithm to explore the hypothesis space is greedy and might tend to local optima To improve the exploration of the hypothesis space, we can beam search At each step k candidate hypotheses are considered. Prof. Pier Luca Lanzi
  • 30. Example: contact lens data 30 Age Spectacle prescription Astigmatism Tear production rate Recommended lenses Young Myope No Reduced None Young Myope No Normal Soft Young Myope Yes Reduced None Young Myope Yes Normal Hard Young Hypermetrope No Reduced None Young Hypermetrope No Normal Soft Young Hypermetrope Yes Reduced None Young Hypermetrope Yes Normal hard Pre-presbyopic Myope No Reduced None Pre-presbyopic Myope No Normal Soft Pre-presbyopic Myope Yes Reduced None Pre-presbyopic Myope Yes Normal Hard Pre-presbyopic Hypermetrope No Reduced None Pre-presbyopic Hypermetrope No Normal Soft Pre-presbyopic Hypermetrope Yes Reduced None Pre-presbyopic Hypermetrope Yes Normal None Presbyopic Myope No Reduced None Presbyopic Myope No Normal None Presbyopic Myope Yes Reduced None Presbyopic Myope Yes Normal Hard Presbyopic Hypermetrope No Reduced None Presbyopic Hypermetrope No Normal Soft Presbyopic Hypermetrope Yes Reduced None Presbyopic Hypermetrope Yes Normal None Prof. Pier Luca Lanzi
  • 31. Example: contact lens data 31 Rule we seek: If ? then recommendation = hard Possible tests: Age = Young 2/8 Age = Pre-presbyopic 1/8 Age = Presbyopic 1/8 Spectacle prescription = Myope 3/12 Spectacle prescription = Hypermetrope 1/12 Astigmatism = no 0/12 Astigmatism = yes 4/12 Tear production rate = Reduced 0/12 Tear production rate = Normal 4/12 Prof. Pier Luca Lanzi
  • 32. Modified rule and resulting data 32 Rule with best test added, If astigmatism = yes then recommendation = hard Instances covered by modified rule, Age Spectacle Astigmatism Tear production Recommended prescription rate lenses Young Myope Yes Reduced None Young Myope Yes Normal Hard Young Hypermetrope Yes Reduced None Young Hypermetrope Yes Normal hard Pre- Myope Yes Reduced None presbyopic Pre- Myope Yes Normal Hard presbyopic Pre- Hypermetrope Yes Reduced None presbyopic Pre- Hypermetrope Yes Normal None presbyopic Presbyopic Myope Yes Reduced None Presbyopic Myope Yes Normal Hard Presbyopic Hypermetrope Yes Reduced None Presbyopic Hypermetrope Yes Normal None Prof. Pier Luca Lanzi
  • 33. Further refinement 33 Current state, If astigmatism = yes and ? then recommendation = hard Possible tests, Age = Young 2/4 Age = Pre-presbyopic 1/4 Age = Presbyopic 1/4 Spectacle prescription = Myope 3/6 Spectacle prescription = Hypermetrope 1/6 Tear production rate = Reduced 0/6 Tear production rate = Normal 4/6 Prof. Pier Luca Lanzi
  • 34. Modified rule and resulting data 34 Rule with best test added: If astigmatism = yes and tear production rate = normal then recommendation = Hard Instances covered by modified rule Age Spectacle Astigmatism Tear production Recommended prescription rate lenses Young Myope Yes Normal Hard Young Hypermetrope Yes Normal Hard Prepresbyopic Myope Yes Normal Hard Prepresbyopic Hypermetrope Yes Normal None Presbyopic Myope Yes Normal Hard Presbyopic Hypermetrope Yes Normal None Prof. Pier Luca Lanzi
  • 35. Further refinement 35 Current state: If astigmatism = yes and tear production rate = normal and ? then recommendation = hard Possible tests: Age = Young 2/2 Age = Pre-presbyopic 1/2 Age = Presbyopic 1/2 Spectacle prescription = Myope 3/3 Spectacle prescription = Hypermetrope 1/3 Tie between the first and the fourth test, we choose the one with greater coverage Prof. Pier Luca Lanzi
  • 36. The result 36 Final rule: If astigmatism = yes and tear production rate = normal and spectacle prescription = myope then recommendation = hard Second rule for recommending “hard lenses”: (built from instances not covered by first rule) If age = young and astigmatism = yes and tear production rate = normal then recommendation = hard These two rules cover all “hard lenses”: Process is repeated with other two classes Prof. Pier Luca Lanzi
  • 37. Pseudo-code for PRISM 37 For each class C Initialize E to the instance set While E contains instances in class C Create a rule R with an empty left-hand side that predicts class C Until R is perfect (or there are no more attributes to use) do For each attribute A not mentioned in R, and each value v, Consider adding the condition A = v to the left-hand side of R Select A and v to maximize the accuracy p/t (break ties by choosing the condition with the largest p) Add A = v to R Remove the instances covered by R from E Prof. Pier Luca Lanzi
  • 38. Test selection criteria 38 Basic covering algorithm: keep adding conditions to a rule to improve its accuracy Add the condition that improves accuracy the most Measure 1: p/t t total instances covered by rule pnumber of these that are positive Produce rules that don’t cover negative instances, as quickly as possible May produce rules with very small coverage —special cases or noise? Measure 2: Information gain p (log(p/t) – log(P/T)) P and T the positive and total numbers before the new condition was added Information gain emphasizes positive rather than negative instances These interact with the pruning mechanism used Prof. Pier Luca Lanzi
  • 39. Instance Elimination 39 Why do we need to eliminate instances? Otherwise, the next rule is identical to previous rule Why do we remove positive instances? Ensure that the next rule is different Why do we remove negative instances? Prevent underestimating accuracy of rule Compare rules R2 and R3 in the diagram Prof. Pier Luca Lanzi
  • 40. Missing values and numeric attributes 40 Common treatment of missing values: for any test, they fail Algorithm must either Use other tests to separate out positive instances Leave them uncovered until later in the process In some cases it is better to treat “missing” as a separate value Numeric attributes are treated just like they are in decision trees Prof. Pier Luca Lanzi
  • 41. Pruning rules 41 Two main strategies: Incremental pruning Global pruning Other difference: pruning criterion Error on hold-out set (reduced-error pruning) Statistical significance MDL principle Also: post-pruning vs. pre-pruning Prof. Pier Luca Lanzi
  • 42. Effect of Rule Simplification 42 Rules are no longer mutually exclusive A record may trigger more than one rule Solution? • Ordered rule set • Unordered rule set – use voting schemes Rules are no longer exhaustive A record may not trigger any rules Solution? • Use a default class Prof. Pier Luca Lanzi
  • 43. Rule Ordering Schemes 43 Rule-based ordering Individual rules are ranked based on their quality Class-based ordering Rules that belong to the same class appear together Prof. Pier Luca Lanzi
  • 44. The RISE algorithm 44 It works in a specific-to-general approach Initially, it creates one rule for each training example Then it goes on through elementary generalization steps until the overall accuracy does not decrease Prof. Pier Luca Lanzi
  • 45. The RISE algorithm 45 Input: ES is the training set Let RS be ES Compute Acc(RS) Repeat For each rule R in RS, find the nearest example E not covered by R (E is of the same class as R) R’ = MostSpecificGeneralization(R,E) RS’ = RS with R replaced by R’ if (Acc(RS’)≥Acc(RS)) then RS=RS’ if R’ is identical to another rule in RS then,delete R’ from RS Until no increase in Acc(RS) is obtained Prof. Pier Luca Lanzi
  • 46. Direct Method: Sequential Covering 46 1. Start from an empty rule 2. Grow a rule using the Learn-One-Rule function 3. Remove training records covered by the rule 4. Repeat Step (2) and (3) until stopping criterion is met Prof. Pier Luca Lanzi
  • 47. Stopping Criterion and Rule Pruning 47 Stopping criterion Compute the gain If gain is not significant, discard the new rule Rule Pruning Similar to post-pruning of decision trees Reduced Error Pruning: • Remove one of the conjuncts in the rule • Compare error rate on validation set before and after pruning • If error improves, prune the conjunct Prof. Pier Luca Lanzi
  • 48. CN2 Algorithm 48 Start from an empty conjunct: {} Add conjuncts that minimizes the entropy measure: {A}, {A,B}, … Determine the rule consequent by taking majority class of instances covered by the rule Prof. Pier Luca Lanzi
  • 49. Direct Method: RIPPER 49 For 2-class problem, choose one of the classes as positive class, and the other as negative class Learn rules for positive class Negative class will be default class For multi-class problem Order the classes according to increasing class prevalence (fraction of instances that belong to a particular class) Learn the rule set for smallest class first, treat the rest as negative class Repeat with next smallest class as positive class Prof. Pier Luca Lanzi
  • 50. Direct Method: RIPPER 50 Growing a rule: Start from empty rule Add conjuncts as long as they improve FOIL’s information gain Stop when rule no longer covers negative examples Prune the rule immediately using incremental reduced error pruning Measure for pruning: v = (p-n)/(p+n) • p: number of positive examples covered by the rule in the validation set • n: number of negative examples covered by the rule in the validation set Pruning method: delete any final sequence of conditions that maximizes v Prof. Pier Luca Lanzi
  • 51. Direct Method: RIPPER 51 Building a Rule Set: Use sequential covering algorithm • Finds the best rule that covers the current set of positive examples • Eliminate both positive and negative examples covered by the rule Each time a rule is added to the rule set, compute the new description length • stop adding new rules when the new description length is d bits longer than the smallest description length obtained so far Prof. Pier Luca Lanzi
  • 52. Direct Method: RIPPER 52 Optimize the rule set: For each rule r in the rule set R • Consider 2 alternative rules: – Replacement rule (r*): grow new rule from scratch – Revised rule(r’): add conjuncts to extend the rule r • Compare the rule set for r against the rule set for r* and r’ • Choose rule set that minimizes MDL principle Repeat rule generation and rule optimization for the remaining positive examples Prof. Pier Luca Lanzi
  • 53. Rules vs. trees 53 Corresponding decision tree: produces exactly the same predictions But rule sets can be more clear when decision trees suffer from replicated subtrees Also, in multi-class situations, covering algorithm concentrates on one class at a time whereas decision tree learner takes all classes into account Prof. Pier Luca Lanzi
  • 54. Rules vs. trees 54 Sequential covering generates a rule by adding tests that maximize rule’s accuracy Similar to situation in decision trees: problem of selecting an attribute to split on But decision tree inducer maximizes overall purity Each new test reduces rule’s coverage space of examples rule so far rule after adding new term Prof. Pier Luca Lanzi
  • 55. Indirect Methods 55 Prof. Pier Luca Lanzi
  • 56. Indirect Method: C4.5rules 56 Extract rules from an unpruned decision tree For each rule, r: A → y, Consider an alternative rule r’: A’ → y where A’ is obtained by removing one of the conjuncts in A Compare the pessimistic error rate for r against all r’s Prune if one of the r’s has lower pessimistic error rate Repeat until we can no longer improve generalization error Prof. Pier Luca Lanzi
  • 57. Indirect Method: C4.5rules 57 Instead of ordering the rules, order subsets of rules (class ordering) Each subset is a collection of rules with the same rule consequent (class) Compute description length of each subset Description length = L(error) + g L(model) g is a parameter that takes into account the presence of redundant attributes in a rule set (default value = 0.5) Prof. Pier Luca Lanzi
  • 58. Example 58 Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class human yes no no no yes mammals python no yes no no no reptiles salmon no yes no yes no fishes whale yes no no yes no mammals frog no yes no sometimes yes amphibians komodo no yes no no yes reptiles bat yes no yes no yes mammals pigeon no yes yes no yes birds cat yes no no no yes mammals leopard shark yes no no yes no fishes turtle no yes no sometimes yes reptiles penguin no yes no sometimes yes birds porcupine yes no no no yes mammals eel no yes no yes no fishes salamander no yes no sometimes yes amphibians gila monster no yes no no yes reptiles platypus no yes no no yes mammals owl no yes yes no yes birds dolphin yes no no yes no mammals eagle no yes yes no yes birds Prof. Pier Luca Lanzi
  • 59. C4.5 versus C4.5rules versus RIPPER 59 C4.5rules: Give (Give Birth=No, Can Fly=Yes) → Birds Birth? (Give Birth=No, Live in Water=Yes) → Fishes No Yes (Give Birth=Yes) → Mammals (Give Birth=No, Can Fly=No, Live in Water=No) → Reptiles Live In Mammals ( ) → Amphibians Water? RIPPER: Yes No (Live in Water=Yes) → Fishes (Have Legs=No) → Reptiles Sometimes (Give Birth=No, Can Fly=No, Live In Water=No) Can Fishes Amphibians → Reptiles Fly? (Can Fly=Yes,Give Birth=No) → Birds () → Mammals No Yes Birds Reptiles Prof. Pier Luca Lanzi
  • 60. C4.5 versus C4.5rules versus RIPPER 60 C4.5 and C4.5rules: PREDICT ED CLASS Amphibians Fishes Reptiles Birds M ammals ACT UAL Amphibians 2 0 0 0 0 CLASS Fishes 0 2 0 0 1 Reptiles 1 0 3 0 0 Birds 1 0 0 3 0 M ammals 0 0 1 0 6 RIPPER: PREDICT ED CLASS Amphibians Fishes Reptiles Birds M ammals ACT UAL Amphibians 0 0 0 0 2 CLASS Fishes 0 3 0 0 0 Reptiles 0 0 3 0 1 Birds 0 0 1 2 1 M ammals 0 2 1 0 4 Prof. Pier Luca Lanzi
  • 61. Summary 61 Advantages of Rule-Based Classifiers As highly expressive as decision trees Easy to interpret Easy to generate Can classify new instances rapidly Performance comparable to decision trees Two approaches: direct and indirect methods Direct Methods, typically apply sequential covering approach Grow a single rule Remove Instances from rule Prune the rule (if necessary) Add rule to Current Rule Set Repeat Other approaches exist Specific to general exploration (RISE) Post processing of neural networks, association rules, decision trees, etc. Prof. Pier Luca Lanzi