Abortion pills in Jeddah | +966572737505 | Get Cytotec
Machine learning and decision trees
1. MV PADMAVATI
BHILAI INSTITUTE OF TECHNOLOGY, DURG, INDIA
MACHINE LEARNING
“Learning denotes changes in a system that ... enable a system to do the
same task … more efficiently the next time.” - Herbert Simon
2. WHAT IS MACHINE LEARNING
Arthur Samuel described it as: “The field of study that gives
computers the ability to learn from data without being
explicitly programmed.”
6. Where can I get datasets?
• Kaggle Datasets - https://www.kaggle.com/datasets
• Amazon data sets - https://registry.opendata.aws/
• UCI Machine Learning Repository-
https://archive.ics.uci.edu/ml/datasets.html
Many more…..
Prepare your Datasets OR you can get data from
8. Machine Learning Tools
• Git and Github
• Python
• Jupyter Notebooks
• Numpy - is mostly used to perform math based operations
during the machine learning process.
• Pandas - to import datasets and manage them
• Matplotlib - We will use this library to plot charts in python.
• scikit-learn is an open source Python machine learning
library
• Many other Python APIs
11. Supervised learning
•Machine learning takes data as input. lets call this data Training data
•The training data includes both Inputs and Labels(Targets)
•We first train the model with the lots of training data(inputs & targets)
12. Types of Supervised learning
Classification separates the data, Regression fits the data
13. Basic Problem: Induce a representation of a function (a
systematic relationship between inputs and outputs) from
examples.
target function f: X → Y
example (x, f(x))
hypothesis g: X → Y such that g(x) = f(x)
x = set of attribute values (attribute-value representation)
Y = set of discrete labels (classification)
Y = continuous values (regression)
Inductive (Supervised) Learning
14. Classification
This is a type of problem where we predict the categorical response value
where the data can be separated into specific “classes” (ex: we predict
one of the values in a set of values).
Some examples are :
1. This mail is spam or not?
2. Will it rain today or not?
3. Is this picture a cat or not?
Basically ‘Yes/No’ type questions called binary classification.
Other examples are :
1. This mail is spam or important or promotion?
2. Is this picture a cat or a dog or a tiger?
This type is called multi-class classification.
15. Iris Flower - 3 Variety Details
Let us first understand the datasets
The data set consists of: 150 samples
3 class labels: species of Iris (Iris setosa, Iris virginica and Iris versicolor)
4 features: Sepal length, Sepal width, Petal length, Petal Width in cm
17. Regression
This is a type of problem where we need to predict the continuous
response value (ex : above we predict number which can vary from
infinity to +infinity)
Some examples are
1. What is the price of house in Opava?
2. What is the value of the stock?
3. What can the temperature tomorrow?
etc… there are tons of things we can predict if we wish.
19. Unsupervised Learning
The training data does not include Targets here so we don’t tell the system
where to go, the system has to understand itself from the data we give.
20. Clustering
This is a type of problem where we group similar things together. It is similar to
multi class classification but here we don’t provide the labels, the system
understands from data itself and cluster the data.
Some examples are :
1. Given news articles, cluster into different types of news
2. Given a set of tweets, cluster based on content of tweet
3. Given a set of images, cluster them into different objects
21.
22. You’re running a company, and you want to develop learning algorithms to
address each of two problems.
Problem 1: You have a large inventory of identical items. You want to predict
how many of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for
each account decide if it has been hacked or not.
Should you treat these as classification or as regression problems?
Treat both as classification problems.
Treat problem 1 as a classification problem, problem 2 as a regression
problem.
Treat problem 1 as a regression problem, problem 2 as a classification
problem.
Treat both as regression problems.
23. Of the following examples, which learning you make use of
3. Given a database of customer data, automatically discover market
segments and group customers into different market segments.
1. Given email labeled as spam/not spam, learn a spam filter.
2. Given a set of news articles found on the web, group them into set of
articles about the same story.
4. Given a dataset of patients diagnosed as either having diabetes or
not, learn to classify new patients as having diabetes or not.
Ans 1: Supervised Learning - Classification
Ans 2: Unsupervised Learning - Clustering
Ans 3: Unsupervised Learning - Clustering
Ans 4: Supervised Learning - Classification
24. Different Classifiers (Algorithms)
• Logistic Regression
• Decision Tree Classifier
• Support Vector Machines
• K- Nearest Neighborhood
• Linear discriminant analysis
• Gaussian Naive Bayes
25. Decision Tree
• Decision tree algorithm falls under the category of the
supervised learning.
• They can be used to solve both regression and classification
problems.
• Learned functions are represented as decision trees (or if-
then-else rules)
27. Decision Trees Expressivity
Decision trees represent a disjunction of conjunctions
(SOP) on constraints on the value of attributes:
(Outlook = Sunny Humidity = Normal)
(Outlook = Overcast)
(Outlook = Rain Wind = Weak) Yes (tennis will be played)
28. Python Code for Classification using Decision Tree
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn.metrics import accuracy_score
// load the datasets
iris=datasets.load_iris()
x=iris.data //predictors
y=iris.target //output labels
// diving data set into training set and testing set
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.5)
// using decision tree classifier
classifier=tree.DecisionTreeClassifier()
classifier.fit(x_train,y_train)
// predictions on test data and print accuracy score
predictions=classifier.predict(x_test)
print(accuracy_score(y_test,predictions))
29. When to use Decision Trees
Problem characteristics:
Instances can be described by attribute value pairs
Disjunctive hypothesis may be required
Possibly noisy training data samples
Robust to errors in training data
Missing attribute values
Different classification problems:
Equipment or medical diagnosis
Credit risk analysis
30. Top-down induction of Decision Trees
The construction of the tree is top-down. The algorithm is
greedy.
The fundamental question is “which attribute should be tested
next? Which question gives us more information?”
Select the best attribute
A descendent node is then created for each possible value of this
attribute and examples are partitioned according to this value
The process is repeated for each successor node until all the
examples are classified correctly or there are no attributes left
31. Which attribute is the best classifier?
A statistical property called information gain, measures how well a
given attribute separates the training examples
Information gain uses the notion of entropy, commonly used in
information theory
Information gain = expected reduction of entropy
32. Decision Tree Induction
(Recursively) partition examples according to the best
attribute.
Key Concepts
entropy
impurity of a set of examples (entropy = 0 if perfectly homogeneous)
(#bits needed to encode class of an arbitrary example)
information gain
expected reduction in entropy caused by partitioning
33. Entropy
• In machine learning sense and especially in this case Entropy is the measure of
homogeneity in the data.
• Its value ranges from 0 to 1.
• Its value is close to 0 if all the example belongs to same class and is close to 1 is
there is almost equal split of the data into different classes.
34. Entropy
Entropy controls how a Decision Tree decides to split the data. It actually
effects how a Decision Tree draws its boundaries.
35. Entropy in binary classification
Entropy measures the impurity of a collection of examples.
It depends from the distribution of the random variable p.
S is a collection of training examples
p+ the proportion of positive examples in S
p– the proportion of negative examples in S
Entropy (S) – p+ log2 p+ – p–log2 p–
Entropy ([14+, 0–]) = – 14/14 log2 (14/14) – 0 log2 (0) = 0
Entropy ([9+, 5–]) = – 9/14 log2 (9/14) – 5/14 log2 (5/14) = 0.94
Entropy ([7+, 7– ]) = – 7/14 log2 (7/14) – 7/14 log2 (7/14) = 1/2 + 1/2 = 1
Note: the log of a number < 1 is negative, 0 p 1, 0 entropy 1
36. Information Gain as Entropy Reduction
Information gain is the expected reduction in entropy
caused by partitioning the examples on an attribute.
The higher the information gain the more effective the
attribute in classifying training data.
Expected reduction in entropy knowing A
Gain(S, A) = Entropy(S) − Entropy(Sv)
v Values(A)
Values(A) possible values for A
Sv subset of S for which A has value v
|Sv|
|S|
41. First step: which attribute to test at the root?
Which attribute should be tested at the root?
Gain(S, Outlook) = 0.246
Gain(S, Humidity) = 0.151
Gain(S, Wind) = 0.084
Gain(S, Temperature) = 0.029
Outlook provides the best prediction for the target
Lets grow the tree:
add to the tree a successor for each possible value of Outlook
partition the training samples according to the value of Outlook
43. Second step
Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970 3/5 0.0 2/5 0.0 = 0.970
Gain(SSunny, Wind) = 0.970 2/5 1.0 3.5 0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970 2/5 0.0 2/5 1.0 1/5 0.0 = 0.570
Humidity provides the best prediction for the target
Lets grow the tree:
add to the tree a successor for each possible value of Humidity
partition the training samples according to the value of
Humidity
44. Second and third steps
{D1, D2, D8}
No
{D9, D11}
Yes
{D4, D5, D10}
Yes
{D6, D14}
No
46. Overfitting in decision trees
Outlook=Sunny, Temp=Hot, Humidity=Normal, Wind=Strong, PlayTennis=No
New noisy example causes splitting of second leaf node.
47. Overfitting in decision tree learning
Building trees that “adapt too much” to the training
examples may lead to “overfitting”.
48. Avoid overfitting in Decision Trees
Two strategies:
1. Stop growing the tree earlier, before perfect classification
2. Allow the tree to overfit the data, and then post-prune the tree
Each node is a candidate for pruning
Pruning consists in removing a subtree rooted in a node: the node
becomes a leaf and is assigned the most common classification
Nodes are pruned iteratively: at each iteration the node whose removal
most increases accuracy on the validation set is pruned.
Pruning stops when no pruning increases accuracy
Pruning