3. Introduction
Machine Learning
from dictionary.com
“The ability of a machine to improve its performance
based on previous results.”
Arthur Samuel (1959)
Field of study that gives computers the ability
to learn without being explicitly programmed
4. Introduction
Machine learning algorithms are organized into taxonomy,
based on the desired outcome of the algorithm. Common
algorithm types include:
Supervised Algorithms
Unsupervised Algorithms
Reinforcement Algorithms
ETC …
Algorithms Types
5.
6. Supervised Algorithms
Supervised Algorithms is the search for algorithms that
reason from externally supplied instances to produce
general hypotheses, which then make predictions about
future instances.
In other words :
The goal of supervised learning is to build a concise model
of the distribution of class labels in terms of predictor
features.
Definition
7. Motivation
Supervised Algorithms
the raison why Supervised are appeared :
because in each domain there is a lot of information
has generated in seconds , So why we don't exploit
those information and this experience to make a good
decision in future
8. Supervised Algorithms
Data: A set of data records (also called examples,
instances or cases) described by
k attributes: A1, A2, … Ak.
a class: Each example is labelled with a pre-
defined class.
Goal: To learn a classification model from the data
that can be used to predict the classes of new
(future, or test) cases/instances.
Approach
9. Supervised Algorithms
Supervised Algorithms Process
Learning (training): Learn a model using the
training data
Testing: Test the model using unseen test data
to assess the model accuracy
,
casestestofnumberTotal
tionsclassificacorrectofNumber
=Accuracy
14. Supervised Algorithms
Decision Tree
leaves represent classifications and branches represent
tests on features that lead to those classifications
x1
x2
?
?
?
?
X1>α1
X2>α2
YES
YES
NO
NO
α1
α2
15. Supervised Algorithms
K- Nearest neighbors
Find the k nearest neighbors of the test example , and
infer its class using their known class.
E.g. K=3
x1
x2
?
?
?
?
?
17. Some Real life applications
Systems Biology :Gene expression microarray data
Face detection :Signature recognition
Medicine : Predict if a patient has heart ischemia
by a spectral analysis of his/her ECG
Recommended Systems
Text categorization : Spam filter
18. Some Real life applications
Microarray data
Separate malignant
from healthy tissues
based on the mRNA
expression profile of
the tissue.
20. Some Real life applications
Text categorization
Categorize text documents into predefined categories
for example, categorize E-mail as “Spam” or “NotSpam”
21.
22. Naïve Bayes
Named after Thomas
Bayes in 1876, who
proposed the Bayes
Theorem.
Definition
Naïve Bayesian Classification
23. Bayesian Classification
What is it ?
The Bayesian classifier is based on Bayes’ Theorem with
independence assumptions between predictors.
Easy to build, with no complicated iterative parameter
estimation which makes it particularly useful for very
large datasets
24. Bayesian Classification
Bayes Theorem
Bayes Theorem provides a way of calculating the
posterior probability, P(C|X), from P(X) ,and P(X|C)
P(C|X) is the posterior probability of
class given predictor (attribute)
P(X|C) is the likelihood which is the
probability of predictor given class
P(X) is the prior probability of
predictor
27. Bayesian Classification
Classify a new Instance
(Outlook=sunny, Temp=cool, Humidity=high, Wind=strong)
How to classify This new Instance ??
28. Bayesian Classifier
Frequency Table
Outlook Play=Yes Play=No
Sunny 2/9 3/5
Overcast 4/9 0/5
Rain 3/9 2/5
Temperature Play=Yes Play=No
Hot 2/9 2/5
Mild 4/9 2/5
Cool 3/9 1/5
Humidity Play=Ye
s
Play=N
o
High 3/9 4/5
Normal 6/9 1/5
Wind Play=Yes Play=No
Strong 3/9 3/5
Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
29. Bayesian Classification
Example
So lets Classify This new instance :
Likelihood of Yes
L=P(Outl=sunny|Yes)*P(Tem=Cool|Yes)*P(Hum=high|Yes)*P(Win=Strong|Yes)*P(Yes)
L=2/9 * 4/9 * 6/9 * 3/9 * 9/14 =0,0053
Likelihood of No
L=P(Outl=sunny|No)*P(Temp=Cool|No)*P(Hum=high|No)*P(Win=Strong|No)*P(No)
L=2/9 * 4/9 * 6/9 * 3/9 * 9/14 =0,0206
Outlook Temperature Humidity Wind Play Tennis
Sunny Cool High Strong ??
30. Example
Bayesian Classification
Now we normalize :
P(Yes)= 0,0053 / ( 0,0053+0,0206 )
P(No)= 0,0206 / ( 0,0053+0,0206 )
Then :
P(Yes) =0,20
P(No) =0,80
So the predict class is
Outlook Temperature Humidity Wind Play Tennis
Sunny Cool High Strong No
31. When an attribute value (Outlook=Overcast) doesn’t
occur with every class value (play tennis =no)
Add 1 to all the counts
Bayesian Classification
The Zero-Frequency Problem
32. Bayesian Classification
Numerical Attributes
Numerical variables need to be transformed to their
categorical before constructing their frequency
tables
The other option we have is using the distribution of
the numerical variable to have a good guess of the
frequency
For example, one common practice is to assume
normal distributions for numerical variables
34. Bayesian Classification
Example of numerical Attributes
Yes 86 96 80 65 70 80 70 90 75
No 85 90 70 95 91
79,1 10,2
86,2 9,7
Humidity Mean StDev
35. Bayesian Classification
Uses Of Bayes Classification
Text Classification
Spam Filtering
Hybrid Recommender System
Online Application
36. Bayesian Classification
Advantages
Easy to implement
Requires a small amount of training data to estimate
the parameters
Good results obtained in most of the cases
37. Bayesian Classification
Disadvantages
Assumption: class conditional independence, therefore
loss of accuracy
Practically, dependencies exist among variables
E.g., hospitals: patients: Profile: age, family history, etc. Symptoms:
fever, cough etc., Disease: lung cancer, diabetes, etc.
Dependencies among these cannot be modelled by
Naïve Bayesian Classifier
38. Application
Spam filtering
Spam filtering is the best known use of Naive Bayesian text
classification. It makes use of a naive Bayes classifier to identify spam
e-mail.
Bayesian spam filtering has become a popular mechanism to distinguish
illegitimate spam email from legitimate email
Many modern mail clients implement Bayesian spam filtering. Users can
also install separate email filtering programs.
DSPAM,
SpamAssassin,
SpamBayes,
ASSP,
39.
40. Rappel
Naïve Bayes
The Bayesian classifier is based on Bayes’ Theorem with
independence assumptions between predictors.
Easy to build, with no complicated iterative parameter
estimation which makes it particularly useful for very
large datasets
45. Example
Naïve Bayes algorithms
doc words class
training D1 SIRM master FSDM A
D2 SIRM master A
D3 master SIRM A
D4 SIRM recherche FSDM B
test D5 SIRM SIRM SIRM master recherche FSDM ???
P(A)=(Nc/Nd)=3/4 , P(B)=(Nc/Nd)=1/4
P(SIRM|A)=(3+1)/(7+4)=4/11, P(master|A)=(3+1)/(7+4)=4/11
P(recherche|A)=(0+1)/(7+4)=1/11, P(FSDM|A)=(1+1)/(7+4)=2/11
46. Example
Naïve Bayes algorithms
doc words class
training D1 SIRM master FSDM A
D2 SIRM master A
D3 master SIRM A
D4 SIRM recherche FSDM B
test D5 SIRM SIRM SIRM master recherche FSDM ???
P(A)=(Nc/Nd)=3/4 , P(B)=(Nc/Nd)=1/4
P(SIRM|B)=(1+1)/(3+4)=2/7, P(master|B)=(0+1)/(3+4)=1/7
P(recherche|B)=(1+1)/(3+4)=2/7, P(FSDM|B)=(1+1)/(3+4)=2/7
47. Example
P(A|D5)=3/4 * (4/11)
4
* 1/11 * 2/11 =0,00022
P(B|D5)=1/4 * (2/7)
5
* 1/7 =0,000068
Now we normalize :
P(A|D5)= 0,00022 / ( 0,000068+0,00022 )
P(B|D5)= 0,000068 / ( 0,000068+0,00022 )
Then :
P(A|D5) =0,76
P(A|D5) =0,24
So the predict class is
Test D5 SIRM SIRM SIRM master recherche FSDM A
48. Conclusion
The naive Bayes model is tremendously appealing because of its
simplicity, elegance, and robustness.
It is one of the oldest formal classification algorithms, and yet even
in its simplest form it is often surprisingly effective.
A large number of modifications have been introduced, by the
statistical, data mining, machine learning, and pattern recognition
communities, in an attempt to make it more flexible.
Firstly : we will define what is ML and those different types and the difference between those types
Secondly : we will define and see the different type of supervised algorithms
Thirdly : we will give some real applications
after we will explain one important algorithm that we choired to talk about
Finally we are going finish with conclusion
As you now everyone has lot of experience in this life and usually when we have to make a prediction or decision about some things we use our experience and what we did in the past
It’s the some things when we talk about ML but a computer does not have experience ,a computer systems learns from data which present some experience of on application domain ,,,, so we can define ML as :
In ML There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning.
We are going to focus just on first type (supervised learning)
as a definition of Supervised Learning
So why we use SL
As approach of supervised learning :
Is to say that the purpose of supervised learning is to predict the class so how to do that
in supervised learning there are two major steps
In supervised algorithms there are 2 problems Regression prb and classification prb
we use regression to Predict Continuous valued output
For example we have size as a feature and we try to predict the output wiche is the age
We use the classification to predict a discreet values of output
Here we have two classes 1 or 0 or false or true
From attribute we try to predict the discreet output or class
There are a lot classification algorithms which we have
The purpose of Neural Networks is ……
for example we have two classes, class presented by red dots and the other with blue dots,
the principle of Neural Networks is to separate red classe and blue classe
If x one is bigger than alpha1 then the output is a bleu classe , if not , if x tow is bigger then alpha tow so the output is the bleu
Classe if not then the output is the red classe
As you can see in this table there are 3 best algorithms ,Decision trees and naïve bays and SVM but
Bayesian classifiers have also exhibited high accuracy and speed when applied to large databases.