SlideShare une entreprise Scribd logo
1  sur  42
Machine Learning
Group IX
What is machine learning?
● Give learning abilities to computers rather than defining all states
● Uses subfields of AI - computational learning theory and pattern recognition
● Make computer programs work on two special stages “Train” and “Predict”
2
Machine learning vs conditional programming
Conditional programming uses the simple if-then
else rules
Problem : Detect flower name by its features
Conditional approach - use if-else rules for all states
AI approach - Train ML model and predict the result.
3
Supervised learning
Supervised learning is the machine learning task of inferring a function from
labeled training data. The training data consist of a set of training examples.
4
Supervised learning algorithms
Decision trees
Naive bayes
K-nearest
5
Train Predict
Decision Tree
6
1. Decision Tree
Decision tree builds
classification model using tree
structure.
It breaks down a dataset into
smaller and smaller subsets.
Finding the optimal decision
tree is np-hard
So we use greedy technique
7
Decision tree algorithm
1. starting with whole training data
2. select attribute or value along dimension that gives “best” split
3. create child nodes based on split
4. recurse on each child using child data until a stopping criterion is reached
• all examples have same class - Entropy is 0
• amount of data is too small - < Min_samples_split
• tree too large
Problem: How do we choose the “best” attribute?
8
Simple Example
Weekend (Example) Weather Parents Money Decision (Category)
W1 Sunny Yes Rich Cinema
W2 Sunny No Rich Tennis
W3 Windy Yes Rich Cinema
W4 Rainy Yes Poor Cinema
W5 Rainy No Rich Stay in
W6 Rainy Yes Poor Cinema
W7 Windy No Poor Cinema
W8 Windy No Rich Shopping
W9 Windy Yes Rich Cinema
W10 Sunny No Rich Tennis
9
Python code
10
Decision tree
When Parent is the splitter entropy is
1.571
Parameters
Criterion = entropy*, gini(default)
Splitter = best(default)*, random
Min_samples_split = 2* (default)
* - used in here
11
How prediction works
Today is windy. I have money and parents not
at home. Predict what I will do??
Weather = “Windy” 1
Parent = “No” 0
Money = “Rich” 1
classified=[0, 1, 0, 0] I may start shopping!
12
Decision tree for large dataset
Sklearn iris data set
13
Naive bayes
14
2. Naive bayes
It is a classification technique based on Bayes’ Theorem with an assumption
of independence among predictors.
Primarily used for text classification which involves high dimensional training
data sets.
Example : Spam filtration, Sentimental analysis, and classifying news
articles.
Bayes theorem provides a way of calculating posterior probability P(c|x) from
P(c), P(x) and P(x|c).
15
P(c|x) is the posterior probability of class (c,target) given predictor (x,
attributes).
● P(c) is the prior probability of class.
● P(x|c) is the likelihood which is the probability of predictor given class.16
How Naive Bayes algorithm works?
Example :
Take training data set of weather and corresponding target variable ‘Play’
(suggesting possibilities of playing). Then classify whether players will play
or not based on weather condition.
Let’s follow the below steps to perform it…
17
Steps:
1. Convert the data set into a frequency table.
2. Create Likelihood table by finding the probabilities. (Overcast
probability=0.29 and probability of playing is 0.64)
18
3. Use Naive bayesian equation to calculate the posterior probability for
each class. The class with the highest posterior probability is the outcome
of prediction.
Problem: Players will play if weather is sunny. Is this statement is correct?
Solution: Solve it using the method of posterior probability.
P(Yes|Sunny)=P(Sunny|Yes)*P(Yes) / P(Sunny)
Here, P(Sunny|Yes)=3/9=0.33
P(Sunny)=5/14=0.36,
P(Yes)=9/14=0.64
P(Yes|Sunny)=0.33*0.64/0.36=0.60
19
Python Code
20
Output :
21
k - Nearest neighbour
22
3. k-Nearest Neighbour
Introduction
The KNN algorithm is a robust and versatile classifier that is often
used as a benchmark for more complex classifiers such as Artificial
Neural Networks (ANN) and Support Vector Machines (SVM).
Despite its simplicity, KNN can outperform more powerful classifiers
and is used in a variety of applications such as economic forecasting,
data compression and genetics.
23
What is KNN?
KNN falls in the supervised learning family of algorithms. Informally,
this means that we are given a labelled dataset consisting of training
observations (x,y)(x,y) and would like to capture the relationship
between xx and yy. More formally, our goal is to learn a function
h:X→Yh:X→Y so that given an unseen observation xx, h(x)h(x) can
confidently predict the corresponding output.
● KNN is non-parametric, instance-based and used in a supervised
learning setting.
● Minimal training but expensive testing. 24
How does KNN work?
The K-nearest neighbor algorithm essentially boils down to forming a majority vote
between the K most similar instances to a given “unseen” observation. Similarity is
defined according to a distance metric between two data points. A popular choice
is the Euclidean distance given by
25
How it works(cont...)
1. Assign k value preferably a small odd number.
2. Find the closest number of k points.
3. Assign the new point from the majority of classes.
26
How it works(cont...)
27
When K is small, we are restraining the region of a given prediction and forcing
our classifier to be “more blind” to the overall distribution. A small value for K
provides the most flexible fit, which will have low bias but high variance.
Graphically, our decision boundary will be more jagged.
28
On the other hand, a higher K averages more voters in each prediction and hence
is more resilient to outliers. Larger values of K will have smoother decision
boundaries which means lower variance but increased bias.
29
Exploring KNN in Code
30
Clustering
31
Unsupervised learning - Clustering
● organization of unlabeled data
into similarity groups
● Three types of clustering
techniques
Hierarchical
Partitional
Bayesian
32
Clustering Algorithms
K-means
● Partitional clustering algorithm
● Choose k(random) data points(seeds) to be the initial centroids
● Assign each data points to the closest centroid
33
K means
34
4. K-means
Algorithm
● Decide value for k
● Initialize the k cluster centers
● Assigning objects into nearest clusters
● Re-estimate the cluster centers
● If objects are not change the membership,exit and go to fourth step
35
Step 1
36
Step 2
37
Step 3
38
Step 4
39
Step 5
40
Python Code
Output Labels [0 0 1 1]
Predicted Label [0]
41
Output Labels [1 1 0 0]
Predicted Label [1]
42

Contenu connexe

Tendances

Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Usama Fayyaz
 
Supervised Machine Learning Techniques
Supervised Machine Learning TechniquesSupervised Machine Learning Techniques
Supervised Machine Learning TechniquesTara ram Goyal
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANNMohamed Talaat
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using PythonShirin Mojarad, Ph.D.
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regressionkishanthkumaar
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3Laila Fatehy
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural NetworksAniket Maurya
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Edureka!
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningParas Kohli
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regressionAkhilesh Joshi
 

Tendances (20)

Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Supervised Machine Learning Techniques
Supervised Machine Learning TechniquesSupervised Machine Learning Techniques
Supervised Machine Learning Techniques
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using Python
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Random forest
Random forestRandom forest
Random forest
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
supervised learning
supervised learningsupervised learning
supervised learning
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regression
 

Similaire à Machine learning algorithms

Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Mncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningMncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningSeung-gyu Byeon
 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationKomal Kotak
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
Machine learning ( Part 2 )
Machine learning ( Part 2 )Machine learning ( Part 2 )
Machine learning ( Part 2 )Sunil OS
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesXavier Rafael Palou
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.pptLPrashanthi
 
Introduction ML
Introduction MLIntroduction ML
Introduction MLLuc Lesoil
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdfMariaKhan905189
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 

Similaire à Machine learning algorithms (20)

Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Data analysis of weather forecasting
Data analysis of weather forecastingData analysis of weather forecasting
Data analysis of weather forecasting
 
Mncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningMncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learning
 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian Classification
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 
Machine learning ( Part 2 )
Machine learning ( Part 2 )Machine learning ( Part 2 )
Machine learning ( Part 2 )
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Classifiers
ClassifiersClassifiers
Classifiers
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
Introduction ML
Introduction MLIntroduction ML
Introduction ML
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
Second subjective assignment
Second  subjective assignmentSecond  subjective assignment
Second subjective assignment
 

Dernier

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Dernier (20)

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

Machine learning algorithms

  • 2. What is machine learning? ● Give learning abilities to computers rather than defining all states ● Uses subfields of AI - computational learning theory and pattern recognition ● Make computer programs work on two special stages “Train” and “Predict” 2
  • 3. Machine learning vs conditional programming Conditional programming uses the simple if-then else rules Problem : Detect flower name by its features Conditional approach - use if-else rules for all states AI approach - Train ML model and predict the result. 3
  • 4. Supervised learning Supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. 4
  • 5. Supervised learning algorithms Decision trees Naive bayes K-nearest 5 Train Predict
  • 7. 1. Decision Tree Decision tree builds classification model using tree structure. It breaks down a dataset into smaller and smaller subsets. Finding the optimal decision tree is np-hard So we use greedy technique 7
  • 8. Decision tree algorithm 1. starting with whole training data 2. select attribute or value along dimension that gives “best” split 3. create child nodes based on split 4. recurse on each child using child data until a stopping criterion is reached • all examples have same class - Entropy is 0 • amount of data is too small - < Min_samples_split • tree too large Problem: How do we choose the “best” attribute? 8
  • 9. Simple Example Weekend (Example) Weather Parents Money Decision (Category) W1 Sunny Yes Rich Cinema W2 Sunny No Rich Tennis W3 Windy Yes Rich Cinema W4 Rainy Yes Poor Cinema W5 Rainy No Rich Stay in W6 Rainy Yes Poor Cinema W7 Windy No Poor Cinema W8 Windy No Rich Shopping W9 Windy Yes Rich Cinema W10 Sunny No Rich Tennis 9
  • 11. Decision tree When Parent is the splitter entropy is 1.571 Parameters Criterion = entropy*, gini(default) Splitter = best(default)*, random Min_samples_split = 2* (default) * - used in here 11
  • 12. How prediction works Today is windy. I have money and parents not at home. Predict what I will do?? Weather = “Windy” 1 Parent = “No” 0 Money = “Rich” 1 classified=[0, 1, 0, 0] I may start shopping! 12
  • 13. Decision tree for large dataset Sklearn iris data set 13
  • 15. 2. Naive bayes It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. Primarily used for text classification which involves high dimensional training data sets. Example : Spam filtration, Sentimental analysis, and classifying news articles. Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). 15
  • 16. P(c|x) is the posterior probability of class (c,target) given predictor (x, attributes). ● P(c) is the prior probability of class. ● P(x|c) is the likelihood which is the probability of predictor given class.16
  • 17. How Naive Bayes algorithm works? Example : Take training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Then classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it… 17
  • 18. Steps: 1. Convert the data set into a frequency table. 2. Create Likelihood table by finding the probabilities. (Overcast probability=0.29 and probability of playing is 0.64) 18
  • 19. 3. Use Naive bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction. Problem: Players will play if weather is sunny. Is this statement is correct? Solution: Solve it using the method of posterior probability. P(Yes|Sunny)=P(Sunny|Yes)*P(Yes) / P(Sunny) Here, P(Sunny|Yes)=3/9=0.33 P(Sunny)=5/14=0.36, P(Yes)=9/14=0.64 P(Yes|Sunny)=0.33*0.64/0.36=0.60 19
  • 22. k - Nearest neighbour 22
  • 23. 3. k-Nearest Neighbour Introduction The KNN algorithm is a robust and versatile classifier that is often used as a benchmark for more complex classifiers such as Artificial Neural Networks (ANN) and Support Vector Machines (SVM). Despite its simplicity, KNN can outperform more powerful classifiers and is used in a variety of applications such as economic forecasting, data compression and genetics. 23
  • 24. What is KNN? KNN falls in the supervised learning family of algorithms. Informally, this means that we are given a labelled dataset consisting of training observations (x,y)(x,y) and would like to capture the relationship between xx and yy. More formally, our goal is to learn a function h:X→Yh:X→Y so that given an unseen observation xx, h(x)h(x) can confidently predict the corresponding output. ● KNN is non-parametric, instance-based and used in a supervised learning setting. ● Minimal training but expensive testing. 24
  • 25. How does KNN work? The K-nearest neighbor algorithm essentially boils down to forming a majority vote between the K most similar instances to a given “unseen” observation. Similarity is defined according to a distance metric between two data points. A popular choice is the Euclidean distance given by 25
  • 26. How it works(cont...) 1. Assign k value preferably a small odd number. 2. Find the closest number of k points. 3. Assign the new point from the majority of classes. 26
  • 28. When K is small, we are restraining the region of a given prediction and forcing our classifier to be “more blind” to the overall distribution. A small value for K provides the most flexible fit, which will have low bias but high variance. Graphically, our decision boundary will be more jagged. 28
  • 29. On the other hand, a higher K averages more voters in each prediction and hence is more resilient to outliers. Larger values of K will have smoother decision boundaries which means lower variance but increased bias. 29
  • 30. Exploring KNN in Code 30
  • 32. Unsupervised learning - Clustering ● organization of unlabeled data into similarity groups ● Three types of clustering techniques Hierarchical Partitional Bayesian 32
  • 33. Clustering Algorithms K-means ● Partitional clustering algorithm ● Choose k(random) data points(seeds) to be the initial centroids ● Assign each data points to the closest centroid 33
  • 35. 4. K-means Algorithm ● Decide value for k ● Initialize the k cluster centers ● Assigning objects into nearest clusters ● Re-estimate the cluster centers ● If objects are not change the membership,exit and go to fourth step 35
  • 41. Python Code Output Labels [0 0 1 1] Predicted Label [0] 41
  • 42. Output Labels [1 1 0 0] Predicted Label [1] 42