2. What is Machine Learning ( ML )
Machine Intelligence Landscape
Python Libraries for ML
ML Algorithms
Agenda
3. Machine learning is a branch of artificial intelligence
concerned with the construction and study of systems
that can learn from data.
What is machine learning?
4. Related Fields
Machine learning is primarily concerned with the
accuracy and effectiveness of the computer system.
psychological models
data
mining
cognitive science
decision theory
information theory
databases
machine
learning
neuroscience
statistics
evolutionary
models
control theory
7. Machine Learning Workflow
A machine learning project has a number of well
known steps:
Define Problem
Acquire Data
Prepare Data
Choose Algorithm- speed, interpretability,
accuracy,
good memory management, implement-ability.
Fit Your Model.
Choose Validation Method and validate
Predict using your model.
8. Why ML Is Hard
The Curse Of Dimensionality
• To generalize locally,
you need
representative
examples from all
relevant variations (and
there are an
exponential number of
them)!
• Classical Solution:
Hope for a smooth
enough target function,
or make it smooth by
handcrafting good
(i). Space grows exponentially
(ii). Space is stretched, points
become equidistant
9. Training, Validation & Testing
Training
set
(observed)
Universal
set
(unobserve
d)
Testing set
(unobserve
d)
Data
acquisition
Practical
usage
10. Training is the process of making the system able to
learn.
Training and Testing
11. There are several factors affecting the performance:
Types of training provided
The form and extent of any initial background knowledge
The type of feedback provided
The learning algorithms used
Two important factors:
Modeling
Optimization
Performance
12. Supervised learning ( )
Prediction
Classification (discrete labels), Regression (real values)
Unsupervised learning ( )
Clustering
Probability distribution estimation
Finding association (in features)
Dimension reduction
Semi-supervised learning
Reinforcement learning
Decision making (robot, chess machine)
Types of ML Algorithms
13. Types of ML Algorithms
Supervised
learning
Unsupervised
learning
Semi-supervised
16. Python Libraries for DS/ML
Many popular Python toolboxes/libraries:
NumPy
SciPy
Pandas
SciKit-Learn
Visualization libraries
matplotlib
Seaborn
and many more …
17. Python Libraries for Data
Science
SciPy:
collection of algorithms for linear algebra,
differential equations, numerical integration,
optimization, statistics and more
built on NumPy
Link: https://www.scipy.org/scipylib/
18. Python Libraries for Data
Science
Pandas:
adds data structures and tools designed to
work with table-like data
provides tools for data manipulation:
reshaping, merging, sorting, slicing,
aggregation etc.
allows handling missing dataLink: http://pandas.pydata.org/
19. matplotlib:
python 2D plotting library which produces
publication quality figures in a variety of
hardcopy formats
a set of functionalities similar to those of
MATLAB
line plots, scatter plots, bar-charts,
histograms, pie charts etc.Link: https://matplotlib.org/
Python Libraries for Data
Science
20. Seaborn:
based on matplotlib
provides high level interface for drawing
attractive statistical graphics
Link: https://seaborn.pydata.org/
Python Libraries for Data
Science
21. Link: http://scikit-learn.org/
Python Libraries for Data
Science
SciKit-Learn:
provides machine learning algorithms:
classification, regression, clustering, model
validation etc.
built on NumPy, SciPy and matplotlib
22. Create a Google Colaboratory
1.Open Google Colab at
https://colab.research.google.com/notebooks/welcome.i
pynb
1.Click on ‘New Notebook’ and select Python 2 notebook
or Python 3 notebook.
OR
1.Open Google Drive.
2.Create a new folder for the project.
3.Click on ‘New’ > ‘More’ > ‘Colaboratory’.
23. Hello World of Machine Learning
The best small project to start with on a
new tool is the classification of iris flowers
(e.g. the iris dataset).
Code in my Google colab notebook
24. Iris Dataset
A multi-class classification problem
4 attributes and 150 rows,
27. Boston Housing Dataset
The Boston Housing Dataset consists of
price of houses in various places in
Boston. Alongside with price, the dataset
also provide information such as Crime
(CRIM), areas of non-retail business in the
town (INDUS), the age of people who own
the house (AGE), and there are many
other attributes
28. Boston Housing Dataset
Attribute Information:
1. CRIM per capita crime rate by town
2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS proportion of non-retail business acres per town
4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
5. NOX nitric oxides concentration (parts per 10 million)
6. RM average number of rooms per dwelling
7. AGE proportion of owner-occupied units built prior to 1940
8. DIS weighted distances to five Boston employment centres
9. RAD index of accessibility to radial highways
10. TAX full-value property-tax rate per $10,000
11. PTRATIO pupil-teacher ratio by town
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13. LSTAT % lower status of the population
14. MEDV Median value of owner-occupied homes in $1000's
29. Data Set Information:
The dataset contains cases from a study that was conducted
between 1958 and 1970 at the University of Chicago's Billings
Hospital on the survival of patients who had undergone
surgery for breast cancer.
Attribute Information:
1. Age of patient at time of operation (numerical)
2. Patient's year of operation (year - 1900, numerical)
3. Number of positive axillary nodes detected (numerical)
4. Survival status (class attribute)
-- 1 = the patient survived 5 years or longer
-- 2 = the patient died within 5 year
Other Datasets - https://archive.ics.uci.edu/ml/datasets.html
Haberman's Survival Data Set
30.
31. ML Algorithms 1 by 1
Linear Regression
Logistic Regression
Decision Tree
SVM
Naive Bayes
kNN
K-Means
Random Forest
32. Linear Regression
Used to estimate real values (cost of
houses, number of calls, total sales etc.)
based on continuous variable(s).
Here, we establish relationship between
independent and dependent variables by
fitting a best line.
This best fit line is known as regression
line and represented by a linear equation
Y= a *X + b.
34. Logistic Regression is a mathematical model to
estimate the probability of an event occurring
having been given some previous data.
Logistic Regression works with binary data, where
either the event happens (1) or the event does not
happen (0).
So given some feature x it tries to find out whether
some event y happens or not. In the case where
the event happens, y is given the value 1. If the
event does not happen, then y is given the value
of 0.
For example, if y represents whether a sports
team wins a match, then y will be 1 if they win the
match or y will be 0 if they do not.
Logistic Regression
35.
36. Decision Trees (DTs) are a non-
parametric supervised learning method
used for classification and regression.
The goal is to create a model that predicts
the value of a target variable by learning
simple decision rules inferred from the
data features.
Decision Tree
37.
38. A Support Vector Machine (SVM) is a supervised
machine learning algorithm that can be employed for
both classification and regression purposes.
SVMs are based on the idea of finding a hyperplane that
best divides a dataset into two classes. Hyperplane is a
line or a surface that linearly separates and classifies a
set of data.
Support vectors are the data points nearest to the
hyperplane. These are points of a data set that, if
removed, would alter the position of the dividing
hyperplane. Because of this, they can be considered the
critical elements of a data set.
The distance between the hyperplane and the nearest
data point from either set is known as the margin. The
goal is to choose a hyperplane with the greatest possible
margin between the hyperplane and any point within the
training set, giving a greater chance of new data being
SVM
39. Naive Bayes methods are a set of
supervised learning algorithms based on
applying Bayes’ theorem with the “naive”
assumption of conditional independence
between every pair of features given the
value of the class variable.
Naive Bayes
40. Bayes Theorem
P(H|E) = (P(E|H) * P(H)) / P(E)
where
•P(H|E) is the probability of hypothesis H given the event E,
a posterior probability.
•P(E|H) is the probability of event E
given that the hypothesis H is true.
•P(H) is the probability of hypothesis H being true
(regardless of any related event), or prior probability of H.
•P(E) is the probability of the event occurring
(regardless of the hypothesis).
This is the Bayes Theorem.
41. K Nearest Neighbor(KNN) is a very simple, easy to
understand, versatile and one of the topmost machine
learning algorithms.
KNN is used in the variety of applications such as
finance, healthcare, political science, handwriting
detection, image recognition and video recognition. In
Credit ratings, financial institutes will predict the credit
rating of customers. In loan disbursement, banking
institutes will predict whether the loan is safe or risky.
In political science, classifying potential voters in two
classes will vote or won’t vote.
KNN algorithm used for both classification and
regression problems.
Based on feature similarity approach.
K - NN
42.
43. K-means clustering is one of the most widely used
unsupervised machine learning algorithms that forms clusters
of data based on the similarity between data instances. For
this particular algorithm to work, the number of clusters has to
be defined beforehand. The K in the K-means refers to the
number of clusters.
The K-means algorithm starts by randomly choosing a
centroid value for each cluster. After that the algorithm
iteratively performs three steps: (i) Find the Euclidean
distance between each data instance and centroids of all the
clusters; (ii) Assign the data instances to the cluster of the
centroid with nearest distance; (iii) Calculate new centroid
values based on the mean values of the coordinates of all the
data instances from the corresponding cluster.
K-Means
44. Random forest is a type of supervised machine
learning algorithm based on ensemble learning.
Ensemble learning is a type of learning where you
join different types of algorithms or same algorithm
multiple times to form a more powerful prediction
model.
The random forest algorithm combines multiple
algorithm of the same type i.e. multiple decision
trees, resulting in a forest of trees, hence the
name "Random Forest". The random forest
algorithm can be used for both regression and
classification tasks.
Random Forest
45. Pick N random records from the dataset.
Build a decision tree based on these N records.
Choose the number of trees you want in your
algorithm and repeat steps 1 and 2.
In case of a regression problem, for a new record,
each tree in the forest predicts a value for Y
(output). The final value can be calculated by
taking the average of all the values predicted by all
the trees in forest.
Or, in case of a classification problem, each tree in
the forest predicts the category to which the new
record belongs. Finally, the new record is assigned
to the category that wins the majority vote
How the Random Forest Algorithm Works
46. Neural Networks are a machine learning
framework that attempts to mimic the learning
pattern of natural biological neural networks.
Biological neural networks have
interconnected neurons with dendrites that
receive inputs, then based on these inputs
they produce an output signal through an
axon to another neuron. We will try to mimic
this process through the use of Artificial
Neural Networks (ANN)
The process of creating a neural network
begins with the most basic form, a single
perceptron.
Neural Networks
48. y = f b+ wixi
i=1
n-1
å
æ
è
ç
ö
ø
÷
x1 x2 x3
b
y
w1 w3w2
What is an Artificial Neuron?
An Artificial Neuron (AN) is a non-linear
parameterized function with restricted
output range
50. Deep Feed Forward Neural Nets
So what then is learning?
hθ(x(i))
hypothesis
(x(i),y(i))
Forward Propagation
Learning is the adjusting of the weights wi,j such that
the cost function J(θ) is minimized (a form of Hebbian
learning).
Simple learning procedure: Back Propagation (of the error signal)
52. Applications
Recognizing patterns:
Facial identities or facial expressions
Handwritten or spoken words
Medical images
Generating patterns:
Generating images or motion sequences
Recognizing anomalies:
Unusual sequences of credit card transactions
Unusual patterns of sensor readings in a nuclear
power plant or unusual sound in your car engine.
Prediction:
Future stock prices or currency exchange rates
53. Applications
Spam filtering, fraud detection:
Recommendation systems:
Information retrieval:
Find documents or images with similar content.
Data Visualization:
Display a huge database in a revealing way
Facial recognition for Face ID, Facebook automatic tagging,
etc. (CNN)
Scene and image description for low-sighted people. (CNN,
LSTM)
Traffic sign classification for self driving cars. (CNN)
Sentiment analysis to detect hateful speech on
Twitter/Instagram. (LSTM)
Automated game playing to… play games. (Deep Q-Learning)
Image style transfer for prismAI, image colorization for old
photographs. (CNN)
61. When Would We Use Machine
Learning?
When patterns exists in our data
Especially when we don’t know what they are
We can not pin down the functional relationships mathematically
Else we would just code up the algorithm
When we have lots of (unlabeled) data
Labeled training sets harder to come by
Data is of high-dimension
High dimension “features”
For example, sensor data
Want to “discover” lower-dimension representations
Dimension reduction
Aside: Machine Learning is heavily focused on implementability
Frequently using well know numerical optimization techniques
Lots of open source code available
See e.g., libsvm (Support Vector Machines): http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Most of my code in python: http://scikit-learn.org/stable/ (many others)
Languages (e.g., octave: https://www.gnu.org/software/octave/)
62. Python Machine Learning by Example, Yuxi
Hayden Liu
Applied Machine Learning, Lecture 10:
Introduction to unsupervised and semi-supervised
learning, Richard Johnson
Building Machine Learning Systems with Python,
Luis Pedro Coelho
deeplearning.ai
https://www.coursera.org/learn/machine-
learning#syllabus
https://chrisalbon.com/#machine_learning
https://medium.com/machine-learning-for-
humans/how-to-learn-machine-learning-
24d53bb64aa1
Further Learning Resources
63. We had a simple overview of some
techniques and algorithms in machine
learning. There are many more techniques
that apply machine learning as a solution.
Machine Learning is New ELECTRICITY.
Conclusion