8. Will John play golf?
Date Weather Temperature Sally going? Did John Golf ?
Sept 1 Sunny 92o F Yes Yes
Sept 2 Cloudy 84o F No No
Sept 3 Raining 84o F No Yes
Sept 4 Sunny 95o F Yes Yes
Date Weather Temperature Sally going? Will John Golf ?
Sept 5 Cloudy 87o F No ?
We want a model based on John’s past behavior to predict
what he will do in the future. Can we use ML?
10. ZeroR
Establishes a base line
Naïve Bayes
Probabilistic model
OneR
Single Rule
J4.5 / C4.5
Decision Tree
11. Upgrade our example
age blood pressure specific gravity albumin sugar
red blood cells pus cell pus cell clumps potassium blood glucose
blood urea serum creatinine sodium hemoglobin packed cell
volume
white blood cell
count
red blood cell
count
hypertension diabetes mellitus coronary artery
Heart disease appetite pedal edema anemia stage
Data Set
• 319 instances or people
• 25 attributes or variables
Machine Learning
• ZeroR
• OneR
• Naïve Bayes
• J4.5 / C4.5
Model
Blood test data for
new individuals with
unknown disease
status
Predict if induvial has
CKD and if so the
stage of there
disease status
12. ZeroR
Past data
(known outcome)
New instance
Classified
Classify new data as the
most ‘popular’ class
Build frequency table
Choice ‘most popular’ or
most frequent class
13. How did ZeroR do?
• Correctly classified 28.2% of the time
• Rule: always guess a new instance (person) has stage three kidney disease
• 28.2% correct classfication rate is our base line
• Correct classification rates above 28.2% are better than guessing
14. OneR
Past data
(known outcome)
New instance
Classified
Choose attribute which
rule has the highest
correct classification rate
Build frequency table for
each attribute. This
generates a rule for
value of each attribute.
15. How did OneR do?
• Correctly classified 80.2% of the time
• Rule based on serum creatinine
• < 0.85 is healthy
• < 1.15 is stage 2
• < 2.25 is stage 3
• > = 2.25 is stage 5
• Single rule is created and responsible for classification
• High classification rate indicates a single value has high influence in predicting class
16. Naïve Bayes
Past data
(known outcome)
New instance
Classified
For each attribute
multiply conditional
probability for each of
the values with
probability of value
Multiply all prior
calculated probabilities
Choose most probable
class
Build frequency table
for each attribute.
Determine
probabilities for values
of each attribute.
Determine conditional
probabilities for values
of each attribute.
17. How did Naïve Bayes do?
• Correctly classified 56.6% of the time
• Conditional and overall probabilities constitute a rule
• High classification rate indicates attributes have ‘equaler’ influence
• No iterative process, faster on larger data sets
18. J4.5 / C4.5
Past data
(known outcome)
New instance
Classified
Follow decision tree to a
leaf or class
Top down recursive
algorithm determining
splitting points based on
information gains
19.
20. How did J4.5 do?
• Correctly classified 88.4% of the time
• Decision tree generated
• Balance between discrimination of OneR and fairness of Naïve Bayes
• Decision trees are popular, intuitive, easy to create and easy to interpret
• People like decision trees. They tell a nice story
21. ZeroR
• Correct classification rate – 28.2%
• Established base line accuracy
• Always guess stage 3 ckd
Naïve Bayes
• Correct classification rate – 56.6%
• Established over all probabilities to
pick most probable class
OneR
• Correct classification rate – 80.2%
• Serum Creatinine
• < 0.85 – Healthy
• < 1.15 – Stage 2
• < 2.25 – Stage 3
• > = 2.25 – Stage 5
J4.5 / C4.5
• Correct classification rate – 88.4%
24. Cross Validation
• Hold out one of ten slices and build the
model on the other nine slices
• Test on the ‘held out’ slice
• Hold out a different slice, build the models
on the now other nine slices and test on the
new ‘held out’ slice
25. Overfitting
• Classification rule that is ‘over fit’ or so specific to the training data set that it does
not generalize to the broader population
• Limiting the complexity or rules can help prevent overfitting
• Large representative data sets can help fight overfitting
• A problem in machine learning
• Must be a suspicious data scientist