2. 2
Rule-Based Classification
Model – Rules
Set of IF-THEN rules
IF age = youth AND student = yes THEN buys_computer =
yes
Rule antecedent/precondition vs. rule consequent
Assessment of a rule: coverage and accuracy
ncovers = # of tuples covered by R
ncorrect = # of tuples correctly classified by R
coverage(R) = ncovers /|D| /* D: training data set */
accuracy(R) = ncorrect / ncovers
3. 3
Rule Accuracy and Coverage
age income studentcredit_ratingbuys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
IF age = youth AND
student = yes THEN
buys_computer = yes
Coverage = 2/14 = 14.28%
Accuracy = 2/2 = 100%
4. 4
If-Then Rules
Rule Triggering
Input X satisfies a rule
Several rules are triggered – Conflict Resolution
Size Ordering
Highest priority to toughest (rule antecedent size) rule
Rule Ordering
Rules are prioritized before-hand
Class based ordering
Rules for most prevalent class comes first or based on mis-classification
cost / class
Rule-based ordering
Rule Quality based measures
Ordered list – Decision list – Must be processed strictly in order
No rule is triggered – Default rule
5. 5
age?
student? credit rating?
<=30 >40
no yes yes
yes
31..40
no
fairexcellentyesno
Example: Rule extraction from the buys_computer decision-tree
IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = yes
IF age = young AND credit_rating = fair THEN buys_computer = no
Rule Extraction from a Decision Tree
Rules are easier to understand than large trees
One rule is created for each path from the root to a leaf
Each attribute-value pair along a path forms a
conjunction: the leaf holds the class prediction
Rules are mutually exclusive and exhaustive (so
unordered)
6. 6
Rule Extraction from Decision Tree
Set of extracted rules – very high
Pruning may be required
Rule Generalization – For a given rule antecedent any condition
that does not improve the estimated accuracy can be dropped
Side-effects of pruning
Mutually Exclusive? / Exhaustive?
C4.5 – Class Ordering for Conflict resolution
All rules for a single class are grouped together
Class rule sets are ranked – to minimize false-positive errors
Default class – one that contains most training tuples not covered by
any rule
7. 7
Rule Extraction from the Training Data
Sequential covering algorithm: Extracts rules directly from training data
Associative Classification Algorithms – may also be used
Typical sequential covering algorithms: FOIL (First Order Inductive Learner), AQ,
CN2, RIPPER
Rules are learned sequentially, each rule for a given class Ciwill cover many
tuples of Ci but none (or few) of the tuples of other classes
Steps:
Rules are learned one at a time
Each time a rule is learned, the tuples covered by the rules are removed
The process repeats on the remaining tuples unless termination condition, e.g.,
when no more training examples or when the quality of a rule returned is below a
user-specified threshold
8. 8
Algorithm: Sequential Covering
Input: D, Att_vals
Output: If-Then rules
Method:
Rule_set = {}
For each class c do
Repeat
Rule = Learn_One_Rule(D, Att_vals, c) // Finds best rule for given class
Remove tuples covered by Rule from D
Until terminating condition
Rule_set = Rule_set + Rule
End for
Return Rule_Set
Rule Extraction from the Training Data
9. 9
Start with the most general rule possible: condition = empty
Adding new attributes by adopting a greedy depth-first strategy
Picks the one that most improves the rule quality
Example:
Start with IF _ THEN loan_decision = accept
Consider IF loan_term=short THEN.. / IF loan_term=long THEN.. / IF income
= high THEN.. / IF income = medium THEN.. / …
If best one is IF income = high THEN loan_decision = accept expand it
further
Rule Extraction from the Training Data
11. 11
Rule Quality measures
Coverage or Accuracy independently will not be sufficient
Rule-Quality measures: consider both coverage and accuracy
Foil-gain (in FOIL & RIPPER): assesses info_gain by extending condition
It favors rules that have high accuracy and cover many positive tuples
R – Existing rule; R’ – Extended rule
Likelihood Ratio Statistic
Likelihood_Ratio = 2 ∑i=1
m
fi log(fi/ei)
Greater this value – higher the significance
)log
''
'
(log'_ 22
negpos
pos
negpos
pos
posGainFOIL
+
−
+
×=
Rule Extraction from the Training Data
12. 12
Rule Extraction from the Training Data
Rule pruning based on an independent set of test tuples
Pos/neg are # of positive/negative tuples covered by R.
If FOIL_Prune is higher for the pruned version of R, prune R
negpos
negpos
RPruneFOIL
+
−
=)(_