2. What is machine learning?
● Give learning abilities to computers rather than defining all states
● Uses subfields of AI - computational learning theory and pattern recognition
● Make computer programs work on two special stages “Train” and “Predict”
2
3. Machine learning vs conditional programming
Conditional programming uses the simple if-then
else rules
Problem : Detect flower name by its features
Conditional approach - use if-else rules for all states
AI approach - Train ML model and predict the result.
3
4. Supervised learning
Supervised learning is the machine learning task of inferring a function from
labeled training data. The training data consist of a set of training examples.
4
7. 1. Decision Tree
Decision tree builds
classification model using tree
structure.
It breaks down a dataset into
smaller and smaller subsets.
Finding the optimal decision
tree is np-hard
So we use greedy technique
7
8. Decision tree algorithm
1. starting with whole training data
2. select attribute or value along dimension that gives “best” split
3. create child nodes based on split
4. recurse on each child using child data until a stopping criterion is reached
• all examples have same class - Entropy is 0
• amount of data is too small - < Min_samples_split
• tree too large
Problem: How do we choose the “best” attribute?
8
9. Simple Example
Weekend (Example) Weather Parents Money Decision (Category)
W1 Sunny Yes Rich Cinema
W2 Sunny No Rich Tennis
W3 Windy Yes Rich Cinema
W4 Rainy Yes Poor Cinema
W5 Rainy No Rich Stay in
W6 Rainy Yes Poor Cinema
W7 Windy No Poor Cinema
W8 Windy No Rich Shopping
W9 Windy Yes Rich Cinema
W10 Sunny No Rich Tennis
9
11. Decision tree
When Parent is the splitter entropy is
1.571
Parameters
Criterion = entropy*, gini(default)
Splitter = best(default)*, random
Min_samples_split = 2* (default)
* - used in here
11
12. How prediction works
Today is windy. I have money and parents not
at home. Predict what I will do??
Weather = “Windy” 1
Parent = “No” 0
Money = “Rich” 1
classified=[0, 1, 0, 0] I may start shopping!
12
15. 2. Naive bayes
It is a classification technique based on Bayes’ Theorem with an assumption
of independence among predictors.
Primarily used for text classification which involves high dimensional training
data sets.
Example : Spam filtration, Sentimental analysis, and classifying news
articles.
Bayes theorem provides a way of calculating posterior probability P(c|x) from
P(c), P(x) and P(x|c).
15
16. P(c|x) is the posterior probability of class (c,target) given predictor (x,
attributes).
● P(c) is the prior probability of class.
● P(x|c) is the likelihood which is the probability of predictor given class.16
17. How Naive Bayes algorithm works?
Example :
Take training data set of weather and corresponding target variable ‘Play’
(suggesting possibilities of playing). Then classify whether players will play
or not based on weather condition.
Let’s follow the below steps to perform it…
17
18. Steps:
1. Convert the data set into a frequency table.
2. Create Likelihood table by finding the probabilities. (Overcast
probability=0.29 and probability of playing is 0.64)
18
19. 3. Use Naive bayesian equation to calculate the posterior probability for
each class. The class with the highest posterior probability is the outcome
of prediction.
Problem: Players will play if weather is sunny. Is this statement is correct?
Solution: Solve it using the method of posterior probability.
P(Yes|Sunny)=P(Sunny|Yes)*P(Yes) / P(Sunny)
Here, P(Sunny|Yes)=3/9=0.33
P(Sunny)=5/14=0.36,
P(Yes)=9/14=0.64
P(Yes|Sunny)=0.33*0.64/0.36=0.60
19
23. 3. k-Nearest Neighbour
Introduction
The KNN algorithm is a robust and versatile classifier that is often
used as a benchmark for more complex classifiers such as Artificial
Neural Networks (ANN) and Support Vector Machines (SVM).
Despite its simplicity, KNN can outperform more powerful classifiers
and is used in a variety of applications such as economic forecasting,
data compression and genetics.
23
24. What is KNN?
KNN falls in the supervised learning family of algorithms. Informally,
this means that we are given a labelled dataset consisting of training
observations (x,y)(x,y) and would like to capture the relationship
between xx and yy. More formally, our goal is to learn a function
h:X→Yh:X→Y so that given an unseen observation xx, h(x)h(x) can
confidently predict the corresponding output.
● KNN is non-parametric, instance-based and used in a supervised
learning setting.
● Minimal training but expensive testing. 24
25. How does KNN work?
The K-nearest neighbor algorithm essentially boils down to forming a majority vote
between the K most similar instances to a given “unseen” observation. Similarity is
defined according to a distance metric between two data points. A popular choice
is the Euclidean distance given by
25
26. How it works(cont...)
1. Assign k value preferably a small odd number.
2. Find the closest number of k points.
3. Assign the new point from the majority of classes.
26
28. When K is small, we are restraining the region of a given prediction and forcing
our classifier to be “more blind” to the overall distribution. A small value for K
provides the most flexible fit, which will have low bias but high variance.
Graphically, our decision boundary will be more jagged.
28
29. On the other hand, a higher K averages more voters in each prediction and hence
is more resilient to outliers. Larger values of K will have smoother decision
boundaries which means lower variance but increased bias.
29
32. Unsupervised learning - Clustering
● organization of unlabeled data
into similarity groups
● Three types of clustering
techniques
Hierarchical
Partitional
Bayesian
32
33. Clustering Algorithms
K-means
● Partitional clustering algorithm
● Choose k(random) data points(seeds) to be the initial centroids
● Assign each data points to the closest centroid
33
35. 4. K-means
Algorithm
● Decide value for k
● Initialize the k cluster centers
● Assigning objects into nearest clusters
● Re-estimate the cluster centers
● If objects are not change the membership,exit and go to fourth step
35