SlideShare une entreprise Scribd logo
1  sur  26
Machine Learning
Chris Sharkey
today
@shark2900
What do you think of when
we say machine learning?
big words
• Hadoop
• Terabyte
• Petabyte
• NoSQL
• Data Science
• D3
• Visualization
• Machine learning
What is machine learning?
“Predictive or descriptive
modeling which learns
from past experience or
data to build models which
can predict the future”
Past Data
(known outcome)
Machine Learning
Model
New Data
(unknown outcome)
Predicted Outcome
Will John play golf?
Date Weather Temperature Sally going? Did John Golf ?
Sept 1 Sunny 92o F Yes Yes
Sept 2 Cloudy 84o F No No
Sept 3 Raining 84o F No Yes
Sept 4 Sunny 95o F Yes Yes
Date Weather Temperature Sally going? Will John Golf ?
Sept 5 Cloudy 87o F No ?
We want a model based on John’s past behavior to predict
what he will do in the future. Can we use ML?
Yes. This is a
classification problem
ZeroR
Establishes a base line
Naïve Bayes
Probabilistic model
OneR
Single Rule
J4.5 / C4.5
Decision Tree
Upgrade our example
age blood pressure specific gravity albumin sugar
red blood cells pus cell pus cell clumps potassium blood glucose
blood urea serum creatinine sodium hemoglobin packed cell
volume
white blood cell
count
red blood cell
count
hypertension diabetes mellitus coronary artery
Heart disease appetite pedal edema anemia stage
Data Set
• 319 instances or people
• 25 attributes or variables
Machine Learning
• ZeroR
• OneR
• Naïve Bayes
• J4.5 / C4.5
Model
Blood test data for
new individuals with
unknown disease
status
Predict if induvial has
CKD and if so the
stage of there
disease status
ZeroR
Past data
(known outcome)
New instance
Classified
Classify new data as the
most ‘popular’ class
Build frequency table
Choice ‘most popular’ or
most frequent class
How did ZeroR do?
• Correctly classified 28.2% of the time
• Rule: always guess a new instance (person) has stage three kidney disease
• 28.2% correct classfication rate is our base line
• Correct classification rates above 28.2% are better than guessing
OneR
Past data
(known outcome)
New instance
Classified
Choose attribute which
rule has the highest
correct classification rate
Build frequency table for
each attribute. This
generates a rule for
value of each attribute.
How did OneR do?
• Correctly classified 80.2% of the time
• Rule based on serum creatinine
• < 0.85 is healthy
• < 1.15 is stage 2
• < 2.25 is stage 3
• > = 2.25 is stage 5
• Single rule is created and responsible for classification
• High classification rate indicates a single value has high influence in predicting class
Naïve Bayes
Past data
(known outcome)
New instance
Classified
For each attribute
multiply conditional
probability for each of
the values with
probability of value
Multiply all prior
calculated probabilities
Choose most probable
class
Build frequency table
for each attribute.
Determine
probabilities for values
of each attribute.
Determine conditional
probabilities for values
of each attribute.
How did Naïve Bayes do?
• Correctly classified 56.6% of the time
• Conditional and overall probabilities constitute a rule
• High classification rate indicates attributes have ‘equaler’ influence
• No iterative process, faster on larger data sets
J4.5 / C4.5
Past data
(known outcome)
New instance
Classified
Follow decision tree to a
leaf or class
Top down recursive
algorithm determining
splitting points based on
information gains
How did J4.5 do?
• Correctly classified 88.4% of the time
• Decision tree generated
• Balance between discrimination of OneR and fairness of Naïve Bayes
• Decision trees are popular, intuitive, easy to create and easy to interpret
• People like decision trees. They tell a nice story
ZeroR
• Correct classification rate – 28.2%
• Established base line accuracy
• Always guess stage 3 ckd
Naïve Bayes
• Correct classification rate – 56.6%
• Established over all probabilities to
pick most probable class
OneR
• Correct classification rate – 80.2%
• Serum Creatinine
• < 0.85 – Healthy
• < 1.15 – Stage 2
• < 2.25 – Stage 3
• > = 2.25 – Stage 5
J4.5 / C4.5
• Correct classification rate – 88.4%
Does this make sense?
Other important concepts
in machine learning.
Cross Validation
• Hold out one of ten slices and build the
model on the other nine slices
• Test on the ‘held out’ slice
• Hold out a different slice, build the models
on the now other nine slices and test on the
new ‘held out’ slice
Overfitting
• Classification rule that is ‘over fit’ or so specific to the training data set that it does
not generalize to the broader population
• Limiting the complexity or rules can help prevent overfitting
• Large representative data sets can help fight overfitting
• A problem in machine learning
• Must be a suspicious data scientist
Question?

Contenu connexe

Tendances

Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...Akanksha Bali
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning AlgorithmsDezyreAcademy
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsAndrew Ferlitsch
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Machine Learning
Machine LearningMachine Learning
Machine LearningRahul Kumar
 
Machine Learning Course | Edureka
Machine Learning Course | EdurekaMachine Learning Course | Edureka
Machine Learning Course | EdurekaEdureka!
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighborUjjawal
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.Megha Sharma
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regressionkishanthkumaar
 
Supervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And TechniquesSupervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And TechniquesSlideTeam
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 

Tendances (20)

K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning Course | Edureka
Machine Learning Course | EdurekaMachine Learning Course | Edureka
Machine Learning Course | Edureka
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Supervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And TechniquesSupervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And Techniques
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 

Similaire à Introduction to Machine Learning & Classification

Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...Ankita Kaul
 
Incremental Software Engineering
Incremental Software EngineeringIncremental Software Engineering
Incremental Software EngineeringCS, NcState
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsScott Fraundorf
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdfSugumarSarDurai
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningKoundinya Desiraju
 
Golden Rules of Bioinformatics
Golden Rules of BioinformaticsGolden Rules of Bioinformatics
Golden Rules of BioinformaticsLeighton Pritchard
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
How can algorithms be biased?
How can algorithms be biased?How can algorithms be biased?
How can algorithms be biased?Software Guru
 
One R (1R) Algorithm
One R (1R) AlgorithmOne R (1R) Algorithm
One R (1R) AlgorithmMLCollab
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptopRising Media, Inc.
 
Machine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practicesMachine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practicesPradeep Redddy Raamana
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceSahil Kumar
 
Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"Impact.Tech
 

Similaire à Introduction to Machine Learning & Classification (20)

Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
 
Incremental Software Engineering
Incremental Software EngineeringIncremental Software Engineering
Incremental Software Engineering
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
0101.genetic algorithm
0101.genetic algorithm0101.genetic algorithm
0101.genetic algorithm
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random Intercepts
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Golden Rules of Bioinformatics
Golden Rules of BioinformaticsGolden Rules of Bioinformatics
Golden Rules of Bioinformatics
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
How can algorithms be biased?
How can algorithms be biased?How can algorithms be biased?
How can algorithms be biased?
 
920 plenary elder
920 plenary elder920 plenary elder
920 plenary elder
 
910 plenary Elder
910 plenary Elder910 plenary Elder
910 plenary Elder
 
One R (1R) Algorithm
One R (1R) AlgorithmOne R (1R) Algorithm
One R (1R) Algorithm
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptop
 
Machine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practicesMachine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practices
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
Predictive Analysis
Predictive AnalysisPredictive Analysis
Predictive Analysis
 
Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial Intelligence
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"
 

Plus de Christopher Sharkey

Plus de Christopher Sharkey (6)

Neural Networks - Types of Neurons
Neural Networks - Types of NeuronsNeural Networks - Types of Neurons
Neural Networks - Types of Neurons
 
Concepts on Mean Stack Development
Concepts on Mean Stack DevelopmentConcepts on Mean Stack Development
Concepts on Mean Stack Development
 
Concepts on Hadoop
Concepts on HadoopConcepts on Hadoop
Concepts on Hadoop
 
Senti Sense Pitch Deck
Senti Sense Pitch DeckSenti Sense Pitch Deck
Senti Sense Pitch Deck
 
Launch Box application for Senti Sense
Launch Box application for Senti SenseLaunch Box application for Senti Sense
Launch Box application for Senti Sense
 
E ship course application for Senti Sense
E ship course application for Senti SenseE ship course application for Senti Sense
E ship course application for Senti Sense
 

Dernier

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Introduction to Machine Learning & Classification

  • 2. What do you think of when we say machine learning?
  • 3.
  • 4. big words • Hadoop • Terabyte • Petabyte • NoSQL • Data Science • D3 • Visualization • Machine learning
  • 5. What is machine learning?
  • 6. “Predictive or descriptive modeling which learns from past experience or data to build models which can predict the future”
  • 7. Past Data (known outcome) Machine Learning Model New Data (unknown outcome) Predicted Outcome
  • 8. Will John play golf? Date Weather Temperature Sally going? Did John Golf ? Sept 1 Sunny 92o F Yes Yes Sept 2 Cloudy 84o F No No Sept 3 Raining 84o F No Yes Sept 4 Sunny 95o F Yes Yes Date Weather Temperature Sally going? Will John Golf ? Sept 5 Cloudy 87o F No ? We want a model based on John’s past behavior to predict what he will do in the future. Can we use ML?
  • 9. Yes. This is a classification problem
  • 10. ZeroR Establishes a base line Naïve Bayes Probabilistic model OneR Single Rule J4.5 / C4.5 Decision Tree
  • 11. Upgrade our example age blood pressure specific gravity albumin sugar red blood cells pus cell pus cell clumps potassium blood glucose blood urea serum creatinine sodium hemoglobin packed cell volume white blood cell count red blood cell count hypertension diabetes mellitus coronary artery Heart disease appetite pedal edema anemia stage Data Set • 319 instances or people • 25 attributes or variables Machine Learning • ZeroR • OneR • Naïve Bayes • J4.5 / C4.5 Model Blood test data for new individuals with unknown disease status Predict if induvial has CKD and if so the stage of there disease status
  • 12. ZeroR Past data (known outcome) New instance Classified Classify new data as the most ‘popular’ class Build frequency table Choice ‘most popular’ or most frequent class
  • 13. How did ZeroR do? • Correctly classified 28.2% of the time • Rule: always guess a new instance (person) has stage three kidney disease • 28.2% correct classfication rate is our base line • Correct classification rates above 28.2% are better than guessing
  • 14. OneR Past data (known outcome) New instance Classified Choose attribute which rule has the highest correct classification rate Build frequency table for each attribute. This generates a rule for value of each attribute.
  • 15. How did OneR do? • Correctly classified 80.2% of the time • Rule based on serum creatinine • < 0.85 is healthy • < 1.15 is stage 2 • < 2.25 is stage 3 • > = 2.25 is stage 5 • Single rule is created and responsible for classification • High classification rate indicates a single value has high influence in predicting class
  • 16. Naïve Bayes Past data (known outcome) New instance Classified For each attribute multiply conditional probability for each of the values with probability of value Multiply all prior calculated probabilities Choose most probable class Build frequency table for each attribute. Determine probabilities for values of each attribute. Determine conditional probabilities for values of each attribute.
  • 17. How did Naïve Bayes do? • Correctly classified 56.6% of the time • Conditional and overall probabilities constitute a rule • High classification rate indicates attributes have ‘equaler’ influence • No iterative process, faster on larger data sets
  • 18. J4.5 / C4.5 Past data (known outcome) New instance Classified Follow decision tree to a leaf or class Top down recursive algorithm determining splitting points based on information gains
  • 19.
  • 20. How did J4.5 do? • Correctly classified 88.4% of the time • Decision tree generated • Balance between discrimination of OneR and fairness of Naïve Bayes • Decision trees are popular, intuitive, easy to create and easy to interpret • People like decision trees. They tell a nice story
  • 21. ZeroR • Correct classification rate – 28.2% • Established base line accuracy • Always guess stage 3 ckd Naïve Bayes • Correct classification rate – 56.6% • Established over all probabilities to pick most probable class OneR • Correct classification rate – 80.2% • Serum Creatinine • < 0.85 – Healthy • < 1.15 – Stage 2 • < 2.25 – Stage 3 • > = 2.25 – Stage 5 J4.5 / C4.5 • Correct classification rate – 88.4%
  • 22. Does this make sense?
  • 23. Other important concepts in machine learning.
  • 24. Cross Validation • Hold out one of ten slices and build the model on the other nine slices • Test on the ‘held out’ slice • Hold out a different slice, build the models on the now other nine slices and test on the new ‘held out’ slice
  • 25. Overfitting • Classification rule that is ‘over fit’ or so specific to the training data set that it does not generalize to the broader population • Limiting the complexity or rules can help prevent overfitting • Large representative data sets can help fight overfitting • A problem in machine learning • Must be a suspicious data scientist