Supervised algorithms

Machine Learning:
- Supervised Algorithms -
Realized by :
 AKHIAT Yassine
 AKACHAR El Yazid
Faculté des Sciences
Dhar El Mahraz-Fès
Année Universitaire : 2014/2015
Master SIRM

Outline
1. Introduction
2. Supervised Algorithms
3. Some Real life applications
4. Naïve Bayes Classifier
5. Implementation
6. Conclusion

Introduction
 Machine Learning
 from dictionary.com
“The ability of a machine to improve its performance
based on previous results.”
 Arthur Samuel (1959)
Field of study that gives computers the ability
to learn without being explicitly programmed

Introduction
Machine learning algorithms are organized into taxonomy,
based on the desired outcome of the algorithm. Common
algorithm types include:
 Supervised Algorithms
 Unsupervised Algorithms
 Reinforcement Algorithms
 ETC …
Algorithms Types

Supervised Algorithms
Supervised Algorithms is the search for algorithms that
reason from externally supplied instances to produce
general hypotheses, which then make predictions about
future instances.
In other words :
The goal of supervised learning is to build a concise model
of the distribution of class labels in terms of predictor
features.
Definition

Motivation
 the raison why Supervised are appeared :
 because in each domain there is a lot of information
has generated in seconds , So why we don't exploit
those information and this experience to make a good
decision in future

 Data: A set of data records (also called examples,
instances or cases) described by
 k attributes: A1, A2, … Ak.
 a class: Each example is labelled with a pre-
defined class.
 Goal: To learn a classification model from the data
that can be used to predict the classes of new
(future, or test) cases/instances.
Approach

Supervised Algorithms Process
 Learning (training): Learn a model using the
training data
 Testing: Test the model using unseen test data
to assess the model accuracy
,
casestestofnumberTotal
tionsclassificacorrectofNumber
=Accuracy

Example : Regression
Age prediction
 Regression :
Predict Continuous
valued output
(Age)

Example: Classification:
 Classification:
Predict discreet
valued output
(0 or 1)
Boolean functions AND

Classification Algorithms
 Neural Networks
 Decision Tree
 K- Nearest neighbors
 Naïve Bayes
 ETC …

Neural Networks
Find the best separating plane between two classes.

Decision Tree
leaves represent classifications and branches represent
tests on features that lead to those classifications
x1
x2
?
?
?
?
X1>α1
X2>α2
YES
YES
NO
NO
α1
α2

K- Nearest neighbors
Find the k nearest neighbors of the test example , and
infer its class using their known class.
E.g. K=3
x1
x2
?
?
?
?
?

Comparison
(**** stars represent the best and * star the worst performance)

Some Real life applications
 Systems Biology :Gene expression microarray data
 Face detection :Signature recognition
 Medicine : Predict if a patient has heart ischemia
by a spectral analysis of his/her ECG
 Recommended Systems
 Text categorization : Spam filter

Microarray data
 Separate malignant
from healthy tissues
based on the mRNA
expression profile of
the tissue.

Machine Learning Basics: 1. General Introduction
Face Detection

Text categorization
Categorize text documents into predefined categories
for example, categorize E-mail as “Spam” or “NotSpam”

 Naïve Bayes
Named after Thomas
Bayes in 1876, who
proposed the Bayes
Theorem.
Definition
Naïve Bayesian Classification

Bayesian Classification
What is it ?
 The Bayesian classifier is based on Bayes’ Theorem with
independence assumptions between predictors.
 Easy to build, with no complicated iterative parameter
estimation which makes it particularly useful for very
large datasets

Bayes Theorem
 Bayes Theorem provides a way of calculating the
posterior probability, P(C|X), from P(X) ,and P(X|C)
 P(C|X) is the posterior probability of
class given predictor (attribute)
 P(X|C) is the likelihood which is the
probability of predictor given class
 P(X) is the prior probability of
predictor

Naïve Bayesian Algorithme

Classify a new Instance
(Outlook=sunny, Temp=cool, Humidity=high, Wind=strong)
 How to classify This new Instance ??

Bayesian Classifier
Frequency Table
Outlook Play=Yes Play=No
Sunny 2/9 3/5
Overcast 4/9 0/5
Rain 3/9 2/5
Temperature Play=Yes Play=No
Hot 2/9 2/5
Mild 4/9 2/5
Cool 3/9 1/5
Humidity Play=Ye
s
Play=N
o
High 3/9 4/5
Normal 6/9 1/5
Wind Play=Yes Play=No
Strong 3/9 3/5
Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14

Example
 So lets Classify This new instance :
 Likelihood of Yes
L=P(Outl=sunny|Yes)*P(Tem=Cool|Yes)*P(Hum=high|Yes)*P(Win=Strong|Yes)*P(Yes)
L=2/9 * 4/9 * 6/9 * 3/9 * 9/14 =0,0053
 Likelihood of No
L=P(Outl=sunny|No)*P(Temp=Cool|No)*P(Hum=high|No)*P(Win=Strong|No)*P(No)
L=2/9 * 4/9 * 6/9 * 3/9 * 9/14 =0,0206
Outlook Temperature Humidity Wind Play Tennis
Sunny Cool High Strong ??

Example
 Now we normalize :
P(Yes)= 0,0053 / ( 0,0053+0,0206 )
P(No)= 0,0206 / ( 0,0053+0,0206 )
Then :
P(Yes) =0,20
P(No) =0,80
So the predict class is
Outlook Temperature Humidity Wind Play Tennis
Sunny Cool High Strong No

 When an attribute value (Outlook=Overcast) doesn’t
occur with every class value (play tennis =no)
 Add 1 to all the counts
The Zero-Frequency Problem

Numerical Attributes
 Numerical variables need to be transformed to their
categorical before constructing their frequency
tables
 The other option we have is using the distribution of
the numerical variable to have a good guess of the
frequency
 For example, one common practice is to assume
normal distributions for numerical variables

Normal distribution
 The probability density function for the normal
distribution is defined by two parameters (mean and
standard deviation )

Example of numerical Attributes
Yes 86 96 80 65 70 80 70 90 75
No 85 90 70 95 91
79,1 10,2
86,2 9,7
Humidity Mean StDev

Uses Of Bayes Classification
 Text Classification
 Spam Filtering
 Hybrid Recommender System
 Online Application

Advantages
 Easy to implement
 Requires a small amount of training data to estimate
the parameters
 Good results obtained in most of the cases

Disadvantages
 Assumption: class conditional independence, therefore
loss of accuracy
 Practically, dependencies exist among variables
 E.g., hospitals: patients: Profile: age, family history, etc. Symptoms:
fever, cough etc., Disease: lung cancer, diabetes, etc.
 Dependencies among these cannot be modelled by
Naïve Bayesian Classifier

Application
Spam filtering
 Spam filtering is the best known use of Naive Bayesian text
classification. It makes use of a naive Bayes classifier to identify spam
e-mail.
 Bayesian spam filtering has become a popular mechanism to distinguish
illegitimate spam email from legitimate email
 Many modern mail clients implement Bayesian spam filtering. Users can
also install separate email filtering programs.
 DSPAM,
 SpamAssassin,
 SpamBayes,
 ASSP,

Rappel
 Naïve Bayes
 The Bayesian classifier is based on Bayes’ Theorem with
independence assumptions between predictors.
 Easy to build, with no complicated iterative parameter
estimation which makes it particularly useful for very
large datasets

Rappel
 Naïve Bayes algorithms

Example
doc words class
training D1 SIRM master FSDM A
D2 SIRM master A
D3 master SIRM A
D4 SIRM recherche FSDM B
test D5 SIRM SIRM SIRM master recherche FSDM ???
P(A)=(Nc/Nd)=3/4 , P(B)=(Nc/Nd)=1/4
P(SIRM|A)=(3+1)/(7+4)=4/11, P(master|A)=(3+1)/(7+4)=4/11
P(recherche|A)=(0+1)/(7+4)=1/11, P(FSDM|A)=(1+1)/(7+4)=2/11

Example
doc words class
training D1 SIRM master FSDM A
D2 SIRM master A
D3 master SIRM A
D4 SIRM recherche FSDM B
test D5 SIRM SIRM SIRM master recherche FSDM ???
P(A)=(Nc/Nd)=3/4 , P(B)=(Nc/Nd)=1/4
P(SIRM|B)=(1+1)/(3+4)=2/7, P(master|B)=(0+1)/(3+4)=1/7
P(recherche|B)=(1+1)/(3+4)=2/7, P(FSDM|B)=(1+1)/(3+4)=2/7

Example
P(A|D5)=3/4 * (4/11)
4
* 1/11 * 2/11 =0,00022
P(B|D5)=1/4 * (2/7)
5
* 1/7 =0,000068
 Now we normalize :
P(A|D5)= 0,00022 / ( 0,000068+0,00022 )
P(B|D5)= 0,000068 / ( 0,000068+0,00022 )
Then :
P(A|D5) =0,76
P(A|D5) =0,24
So the predict class is
Test D5 SIRM SIRM SIRM master recherche FSDM A

Conclusion
The naive Bayes model is tremendously appealing because of its
simplicity, elegance, and robustness.
It is one of the oldest formal classification algorithms, and yet even
in its simplest form it is often surprisingly effective.
A large number of modifications have been introduced, by the
statistical, data mining, machine learning, and pattern recognition
communities, in an attempt to make it more flexible.

Supervised algorithms

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (7)

Similar to Supervised algorithms

Similar to Supervised algorithms (20)

Recently uploaded

Recently uploaded (20)

Supervised algorithms

Editor's Notes