SlideShare une entreprise Scribd logo
1  sur  64
Machine Learning
Nimrita Koul
Assistant Professor
School of Computing & IT
REVA University
Bangalore
 What is Machine Learning ( ML )
 Machine Intelligence Landscape
 Python Libraries for ML
 ML Algorithms
Agenda
 Machine learning is a branch of artificial intelligence
concerned with the construction and study of systems
that can learn from data.
What is machine learning?
Related Fields
Machine learning is primarily concerned with the
accuracy and effectiveness of the computer system.
psychological models
data
mining
cognitive science
decision theory
information theory
databases
machine
learning
neuroscience
statistics
evolutionary
models
control theory
Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
Machine Learning Workflow
A machine learning project has a number of well
known steps:
 Define Problem
 Acquire Data
 Prepare Data
 Choose Algorithm- speed, interpretability,
accuracy,
good memory management, implement-ability.
 Fit Your Model.
 Choose Validation Method and validate
 Predict using your model.
Why ML Is Hard
The Curse Of Dimensionality
• To generalize locally,
you need
representative
examples from all
relevant variations (and
there are an
exponential number of
them)!
• Classical Solution:
Hope for a smooth
enough target function,
or make it smooth by
handcrafting good
(i). Space grows exponentially
(ii). Space is stretched, points
become equidistant
Training, Validation & Testing
Training
set
(observed)
Universal
set
(unobserve
d)
Testing set
(unobserve
d)
Data
acquisition
Practical
usage
 Training is the process of making the system able to
learn.
Training and Testing
 There are several factors affecting the performance:
 Types of training provided
 The form and extent of any initial background knowledge
 The type of feedback provided
 The learning algorithms used
 Two important factors:
 Modeling
 Optimization
Performance
 Supervised learning ( )
 Prediction
 Classification (discrete labels), Regression (real values)
 Unsupervised learning ( )
 Clustering
 Probability distribution estimation
 Finding association (in features)
 Dimension reduction
 Semi-supervised learning
 Reinforcement learning
 Decision making (robot, chess machine)
Types of ML Algorithms
Types of ML Algorithms
Supervised
learning
Unsupervised
learning
Semi-supervised
 Supervised learning
Machine learning structure
 Unsupervised learning
Machine learning structure
Python Libraries for DS/ML
Many popular Python toolboxes/libraries:
 NumPy
 SciPy
 Pandas
 SciKit-Learn
Visualization libraries
 matplotlib
 Seaborn
and many more …
Python Libraries for Data
Science
SciPy:
 collection of algorithms for linear algebra,
differential equations, numerical integration,
optimization, statistics and more
 built on NumPy
Link: https://www.scipy.org/scipylib/
Python Libraries for Data
Science
Pandas:
 adds data structures and tools designed to
work with table-like data
 provides tools for data manipulation:
reshaping, merging, sorting, slicing,
aggregation etc.
 allows handling missing dataLink: http://pandas.pydata.org/
matplotlib:
 python 2D plotting library which produces
publication quality figures in a variety of
hardcopy formats
 a set of functionalities similar to those of
MATLAB
 line plots, scatter plots, bar-charts,
histograms, pie charts etc.Link: https://matplotlib.org/
Python Libraries for Data
Science
Seaborn:
 based on matplotlib
 provides high level interface for drawing
attractive statistical graphics
Link: https://seaborn.pydata.org/
Python Libraries for Data
Science
Link: http://scikit-learn.org/
Python Libraries for Data
Science
SciKit-Learn:
 provides machine learning algorithms:
classification, regression, clustering, model
validation etc.
 built on NumPy, SciPy and matplotlib
Create a Google Colaboratory
1.Open Google Colab at
https://colab.research.google.com/notebooks/welcome.i
pynb
1.Click on ‘New Notebook’ and select Python 2 notebook
or Python 3 notebook.
OR
1.Open Google Drive.
2.Create a new folder for the project.
3.Click on ‘New’ > ‘More’ > ‘Colaboratory’.
Hello World of Machine Learning
 The best small project to start with on a
new tool is the classification of iris flowers
(e.g. the iris dataset).
 Code in my Google colab notebook
Iris Dataset
 A multi-class classification problem
 4 attributes and 150 rows,
Diabetes Data Set
Boston Housing Dataset
 The Boston Housing Dataset consists of
price of houses in various places in
Boston. Alongside with price, the dataset
also provide information such as Crime
(CRIM), areas of non-retail business in the
town (INDUS), the age of people who own
the house (AGE), and there are many
other attributes
Boston Housing Dataset
Attribute Information:
 1. CRIM per capita crime rate by town
 2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.
 3. INDUS proportion of non-retail business acres per town
 4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
 5. NOX nitric oxides concentration (parts per 10 million)
 6. RM average number of rooms per dwelling
 7. AGE proportion of owner-occupied units built prior to 1940
 8. DIS weighted distances to five Boston employment centres
 9. RAD index of accessibility to radial highways
 10. TAX full-value property-tax rate per $10,000
 11. PTRATIO pupil-teacher ratio by town
 12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
 13. LSTAT % lower status of the population
 14. MEDV Median value of owner-occupied homes in $1000's
 Data Set Information:
 The dataset contains cases from a study that was conducted
between 1958 and 1970 at the University of Chicago's Billings
Hospital on the survival of patients who had undergone
surgery for breast cancer.

Attribute Information:
 1. Age of patient at time of operation (numerical)
2. Patient's year of operation (year - 1900, numerical)
3. Number of positive axillary nodes detected (numerical)
4. Survival status (class attribute)
-- 1 = the patient survived 5 years or longer
-- 2 = the patient died within 5 year
 Other Datasets - https://archive.ics.uci.edu/ml/datasets.html
Haberman's Survival Data Set
ML Algorithms 1 by 1
 Linear Regression
 Logistic Regression
 Decision Tree
 SVM
 Naive Bayes
 kNN
 K-Means
 Random Forest
Linear Regression
 Used to estimate real values (cost of
houses, number of calls, total sales etc.)
based on continuous variable(s).
 Here, we establish relationship between
independent and dependent variables by
fitting a best line.
 This best fit line is known as regression
line and represented by a linear equation
Y= a *X + b.
Linear Regression Model
Linear
component
Intercept
Slope
Random
Error
Dependent
Variable
Independent
Variable
Random Error
component
ii10i εXββY 
 Logistic Regression is a mathematical model to
estimate the probability of an event occurring
having been given some previous data.
 Logistic Regression works with binary data, where
either the event happens (1) or the event does not
happen (0).
 So given some feature x it tries to find out whether
some event y happens or not. In the case where
the event happens, y is given the value 1. If the
event does not happen, then y is given the value
of 0.
 For example, if y represents whether a sports
team wins a match, then y will be 1 if they win the
match or y will be 0 if they do not.
Logistic Regression
 Decision Trees (DTs) are a non-
parametric supervised learning method
used for classification and regression.
 The goal is to create a model that predicts
the value of a target variable by learning
simple decision rules inferred from the
data features.
Decision Tree
 A Support Vector Machine (SVM) is a supervised
machine learning algorithm that can be employed for
both classification and regression purposes.
 SVMs are based on the idea of finding a hyperplane that
best divides a dataset into two classes. Hyperplane is a
line or a surface that linearly separates and classifies a
set of data.
 Support vectors are the data points nearest to the
hyperplane. These are points of a data set that, if
removed, would alter the position of the dividing
hyperplane. Because of this, they can be considered the
critical elements of a data set.
 The distance between the hyperplane and the nearest
data point from either set is known as the margin. The
goal is to choose a hyperplane with the greatest possible
margin between the hyperplane and any point within the
training set, giving a greater chance of new data being
SVM
 Naive Bayes methods are a set of
supervised learning algorithms based on
applying Bayes’ theorem with the “naive”
assumption of conditional independence
between every pair of features given the
value of the class variable.
Naive Bayes
Bayes Theorem
P(H|E) = (P(E|H) * P(H)) / P(E)
where
•P(H|E) is the probability of hypothesis H given the event E,
a posterior probability.
•P(E|H) is the probability of event E
given that the hypothesis H is true.
•P(H) is the probability of hypothesis H being true
(regardless of any related event), or prior probability of H.
•P(E) is the probability of the event occurring
(regardless of the hypothesis).
This is the Bayes Theorem.
 K Nearest Neighbor(KNN) is a very simple, easy to
understand, versatile and one of the topmost machine
learning algorithms.
 KNN is used in the variety of applications such as
finance, healthcare, political science, handwriting
detection, image recognition and video recognition. In
Credit ratings, financial institutes will predict the credit
rating of customers. In loan disbursement, banking
institutes will predict whether the loan is safe or risky.
In political science, classifying potential voters in two
classes will vote or won’t vote.
 KNN algorithm used for both classification and
regression problems.
 Based on feature similarity approach.
K - NN
 K-means clustering is one of the most widely used
unsupervised machine learning algorithms that forms clusters
of data based on the similarity between data instances. For
this particular algorithm to work, the number of clusters has to
be defined beforehand. The K in the K-means refers to the
number of clusters.
 The K-means algorithm starts by randomly choosing a
centroid value for each cluster. After that the algorithm
iteratively performs three steps: (i) Find the Euclidean
distance between each data instance and centroids of all the
clusters; (ii) Assign the data instances to the cluster of the
centroid with nearest distance; (iii) Calculate new centroid
values based on the mean values of the coordinates of all the
data instances from the corresponding cluster.
K-Means
 Random forest is a type of supervised machine
learning algorithm based on ensemble learning.
 Ensemble learning is a type of learning where you
join different types of algorithms or same algorithm
multiple times to form a more powerful prediction
model.
 The random forest algorithm combines multiple
algorithm of the same type i.e. multiple decision
trees, resulting in a forest of trees, hence the
name "Random Forest". The random forest
algorithm can be used for both regression and
classification tasks.
Random Forest
 Pick N random records from the dataset.
 Build a decision tree based on these N records.
 Choose the number of trees you want in your
algorithm and repeat steps 1 and 2.
 In case of a regression problem, for a new record,
each tree in the forest predicts a value for Y
(output). The final value can be calculated by
taking the average of all the values predicted by all
the trees in forest.
 Or, in case of a classification problem, each tree in
the forest predicts the category to which the new
record belongs. Finally, the new record is assigned
to the category that wins the majority vote
How the Random Forest Algorithm Works
 Neural Networks are a machine learning
framework that attempts to mimic the learning
pattern of natural biological neural networks.
Biological neural networks have
interconnected neurons with dendrites that
receive inputs, then based on these inputs
they produce an output signal through an
axon to another neuron. We will try to mimic
this process through the use of Artificial
Neural Networks (ANN)
 The process of creating a neural network
begins with the most basic form, a single
perceptron.
Neural Networks
Perceptron – An Artificial
Neuron
y = f b+ wixi
i=1
n-1
å
æ
è
ç
ö
ø
÷
x1 x2 x3
b
y
w1 w3w2
What is an Artificial Neuron?
 An Artificial Neuron (AN) is a non-linear
parameterized function with restricted
output range
Neural Network
Deep Feed Forward Neural Nets
So what then is learning?
hθ(x(i))
hypothesis
(x(i),y(i))
Forward Propagation
Learning is the adjusting of the weights wi,j such that
the cost function J(θ) is minimized (a form of Hebbian
learning).
Simple learning procedure: Back Propagation (of the error signal)
Applications
Applications
 Recognizing patterns:
 Facial identities or facial expressions
 Handwritten or spoken words
 Medical images
 Generating patterns:
 Generating images or motion sequences
 Recognizing anomalies:
 Unusual sequences of credit card transactions
 Unusual patterns of sensor readings in a nuclear
power plant or unusual sound in your car engine.
 Prediction:
 Future stock prices or currency exchange rates
Applications
 Spam filtering, fraud detection:
 Recommendation systems:
 Information retrieval:
 Find documents or images with similar content.
 Data Visualization:
 Display a huge database in a revealing way
 Facial recognition for Face ID, Facebook automatic tagging,
etc. (CNN)
 Scene and image description for low-sighted people. (CNN,
LSTM)
 Traffic sign classification for self driving cars. (CNN)
 Sentiment analysis to detect hateful speech on
Twitter/Instagram. (LSTM)
 Automated game playing to… play games. (Deep Q-Learning)
 Image style transfer for prismAI, image colorization for old
photographs. (CNN)
Hand Written Digit Recognition
Face Detection
Video Object Detection
Object Tracking in Video
Identifying Book Covers
Displaying the structure of a set of documents
using a deep neural network
When Would We Use Machine
Learning?
 When patterns exists in our data
 Especially when we don’t know what they are
 We can not pin down the functional relationships mathematically
 Else we would just code up the algorithm
 When we have lots of (unlabeled) data
 Labeled training sets harder to come by
 Data is of high-dimension
 High dimension “features”
 For example, sensor data
 Want to “discover” lower-dimension representations
 Dimension reduction
 Aside: Machine Learning is heavily focused on implementability
 Frequently using well know numerical optimization techniques
 Lots of open source code available
 See e.g., libsvm (Support Vector Machines): http://www.csie.ntu.edu.tw/~cjlin/libsvm/
 Most of my code in python: http://scikit-learn.org/stable/ (many others)
 Languages (e.g., octave: https://www.gnu.org/software/octave/)
 Python Machine Learning by Example, Yuxi
Hayden Liu
 Applied Machine Learning, Lecture 10:
Introduction to unsupervised and semi-supervised
learning, Richard Johnson
 Building Machine Learning Systems with Python,
Luis Pedro Coelho
 deeplearning.ai
 https://www.coursera.org/learn/machine-
learning#syllabus
 https://chrisalbon.com/#machine_learning
 https://medium.com/machine-learning-for-
humans/how-to-learn-machine-learning-
24d53bb64aa1
Further Learning Resources
We had a simple overview of some
techniques and algorithms in machine
learning. There are many more techniques
that apply machine learning as a solution.
Machine Learning is New ELECTRICITY.
Conclusion
Q&A
THANK YOU

Contenu connexe

Tendances

An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns associationDeepaR42
 
Graph Tea: Simulating Tool for Graph Theory & Algorithms
Graph Tea: Simulating Tool for Graph Theory & AlgorithmsGraph Tea: Simulating Tool for Graph Theory & Algorithms
Graph Tea: Simulating Tool for Graph Theory & AlgorithmsIJMTST Journal
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet SentimentLucinda Linde
 
Python for data science
Python for data sciencePython for data science
Python for data sciencebotsplash.com
 
Introduction into machine learning
Introduction into machine learningIntroduction into machine learning
Introduction into machine learningmohamed Naas
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Edureka!
 
Result analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataResult analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataijistjournal
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsbutest
 
DataRobot R Package
DataRobot R PackageDataRobot R Package
DataRobot R PackageDataRobot
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1BigML, Inc
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine LearningJeff Tanner
 
Linear Regression, Machine learning term
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning termS Rulez
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 

Tendances (20)

An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns association
 
Graph Tea: Simulating Tool for Graph Theory & Algorithms
Graph Tea: Simulating Tool for Graph Theory & AlgorithmsGraph Tea: Simulating Tool for Graph Theory & Algorithms
Graph Tea: Simulating Tool for Graph Theory & Algorithms
 
Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
 
Python for data science
Python for data sciencePython for data science
Python for data science
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 
2.mathematics for machine learning
2.mathematics for machine learning2.mathematics for machine learning
2.mathematics for machine learning
 
Introduction into machine learning
Introduction into machine learningIntroduction into machine learning
Introduction into machine learning
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
 
Result analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataResult analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted data
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
Cluster
ClusterCluster
Cluster
 
DataRobot R Package
DataRobot R PackageDataRobot R Package
DataRobot R Package
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Linear Regression, Machine learning term
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning term
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 

Similaire à Nimrita koul Machine Learning

Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Jonathan Stray
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfDr. Radhey Shyam
 
Making data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualizationMaking data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualizationVisual Resources Association
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Topic_6
Topic_6Topic_6
Topic_6butest
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsIstituto nazionale di statistica
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 
UNIT-4.docx
UNIT-4.docxUNIT-4.docx
UNIT-4.docxscet315
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLAnubhav Jain
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovarianceShrey Nishchal
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-feiTianlu Wang
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 

Similaire à Nimrita koul Machine Learning (20)

Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
 
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
 
An introduction to R
An introduction to RAn introduction to R
An introduction to R
 
Data mining
Data mining Data mining
Data mining
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Making data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualizationMaking data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualization
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Topic_6
Topic_6Topic_6
Topic_6
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
UNIT-4.docx
UNIT-4.docxUNIT-4.docx
UNIT-4.docx
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-fei
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 

Plus de Nimrita Koul

Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plottingNimrita Koul
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingNimrita Koul
 
Templates and Exception Handling in C++
Templates and Exception Handling in C++Templates and Exception Handling in C++
Templates and Exception Handling in C++Nimrita Koul
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformaticsNimrita Koul
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysisNimrita Koul
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learningNimrita Koul
 
Hands on data science with r.pptx
Hands  on data science with r.pptxHands  on data science with r.pptx
Hands on data science with r.pptxNimrita Koul
 
Python Traning presentation
Python Traning presentationPython Traning presentation
Python Traning presentationNimrita Koul
 

Plus de Nimrita Koul (10)

Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plotting
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
Structures in C
Structures in CStructures in C
Structures in C
 
Templates and Exception Handling in C++
Templates and Exception Handling in C++Templates and Exception Handling in C++
Templates and Exception Handling in C++
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformatics
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
Hands on data science with r.pptx
Hands  on data science with r.pptxHands  on data science with r.pptx
Hands on data science with r.pptx
 
Python Traning presentation
Python Traning presentationPython Traning presentation
Python Traning presentation
 

Dernier

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Dernier (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

Nimrita koul Machine Learning

  • 1. Machine Learning Nimrita Koul Assistant Professor School of Computing & IT REVA University Bangalore
  • 2.  What is Machine Learning ( ML )  Machine Intelligence Landscape  Python Libraries for ML  ML Algorithms Agenda
  • 3.  Machine learning is a branch of artificial intelligence concerned with the construction and study of systems that can learn from data. What is machine learning?
  • 4. Related Fields Machine learning is primarily concerned with the accuracy and effectiveness of the computer system. psychological models data mining cognitive science decision theory information theory databases machine learning neuroscience statistics evolutionary models control theory
  • 5.
  • 7. Machine Learning Workflow A machine learning project has a number of well known steps:  Define Problem  Acquire Data  Prepare Data  Choose Algorithm- speed, interpretability, accuracy, good memory management, implement-ability.  Fit Your Model.  Choose Validation Method and validate  Predict using your model.
  • 8. Why ML Is Hard The Curse Of Dimensionality • To generalize locally, you need representative examples from all relevant variations (and there are an exponential number of them)! • Classical Solution: Hope for a smooth enough target function, or make it smooth by handcrafting good (i). Space grows exponentially (ii). Space is stretched, points become equidistant
  • 9. Training, Validation & Testing Training set (observed) Universal set (unobserve d) Testing set (unobserve d) Data acquisition Practical usage
  • 10.  Training is the process of making the system able to learn. Training and Testing
  • 11.  There are several factors affecting the performance:  Types of training provided  The form and extent of any initial background knowledge  The type of feedback provided  The learning algorithms used  Two important factors:  Modeling  Optimization Performance
  • 12.  Supervised learning ( )  Prediction  Classification (discrete labels), Regression (real values)  Unsupervised learning ( )  Clustering  Probability distribution estimation  Finding association (in features)  Dimension reduction  Semi-supervised learning  Reinforcement learning  Decision making (robot, chess machine) Types of ML Algorithms
  • 13. Types of ML Algorithms Supervised learning Unsupervised learning Semi-supervised
  • 14.  Supervised learning Machine learning structure
  • 15.  Unsupervised learning Machine learning structure
  • 16. Python Libraries for DS/ML Many popular Python toolboxes/libraries:  NumPy  SciPy  Pandas  SciKit-Learn Visualization libraries  matplotlib  Seaborn and many more …
  • 17. Python Libraries for Data Science SciPy:  collection of algorithms for linear algebra, differential equations, numerical integration, optimization, statistics and more  built on NumPy Link: https://www.scipy.org/scipylib/
  • 18. Python Libraries for Data Science Pandas:  adds data structures and tools designed to work with table-like data  provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation etc.  allows handling missing dataLink: http://pandas.pydata.org/
  • 19. matplotlib:  python 2D plotting library which produces publication quality figures in a variety of hardcopy formats  a set of functionalities similar to those of MATLAB  line plots, scatter plots, bar-charts, histograms, pie charts etc.Link: https://matplotlib.org/ Python Libraries for Data Science
  • 20. Seaborn:  based on matplotlib  provides high level interface for drawing attractive statistical graphics Link: https://seaborn.pydata.org/ Python Libraries for Data Science
  • 21. Link: http://scikit-learn.org/ Python Libraries for Data Science SciKit-Learn:  provides machine learning algorithms: classification, regression, clustering, model validation etc.  built on NumPy, SciPy and matplotlib
  • 22. Create a Google Colaboratory 1.Open Google Colab at https://colab.research.google.com/notebooks/welcome.i pynb 1.Click on ‘New Notebook’ and select Python 2 notebook or Python 3 notebook. OR 1.Open Google Drive. 2.Create a new folder for the project. 3.Click on ‘New’ > ‘More’ > ‘Colaboratory’.
  • 23. Hello World of Machine Learning  The best small project to start with on a new tool is the classification of iris flowers (e.g. the iris dataset).  Code in my Google colab notebook
  • 24. Iris Dataset  A multi-class classification problem  4 attributes and 150 rows,
  • 25.
  • 27. Boston Housing Dataset  The Boston Housing Dataset consists of price of houses in various places in Boston. Alongside with price, the dataset also provide information such as Crime (CRIM), areas of non-retail business in the town (INDUS), the age of people who own the house (AGE), and there are many other attributes
  • 28. Boston Housing Dataset Attribute Information:  1. CRIM per capita crime rate by town  2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.  3. INDUS proportion of non-retail business acres per town  4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)  5. NOX nitric oxides concentration (parts per 10 million)  6. RM average number of rooms per dwelling  7. AGE proportion of owner-occupied units built prior to 1940  8. DIS weighted distances to five Boston employment centres  9. RAD index of accessibility to radial highways  10. TAX full-value property-tax rate per $10,000  11. PTRATIO pupil-teacher ratio by town  12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town  13. LSTAT % lower status of the population  14. MEDV Median value of owner-occupied homes in $1000's
  • 29.  Data Set Information:  The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer.  Attribute Information:  1. Age of patient at time of operation (numerical) 2. Patient's year of operation (year - 1900, numerical) 3. Number of positive axillary nodes detected (numerical) 4. Survival status (class attribute) -- 1 = the patient survived 5 years or longer -- 2 = the patient died within 5 year  Other Datasets - https://archive.ics.uci.edu/ml/datasets.html Haberman's Survival Data Set
  • 30.
  • 31. ML Algorithms 1 by 1  Linear Regression  Logistic Regression  Decision Tree  SVM  Naive Bayes  kNN  K-Means  Random Forest
  • 32. Linear Regression  Used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s).  Here, we establish relationship between independent and dependent variables by fitting a best line.  This best fit line is known as regression line and represented by a linear equation Y= a *X + b.
  • 34.  Logistic Regression is a mathematical model to estimate the probability of an event occurring having been given some previous data.  Logistic Regression works with binary data, where either the event happens (1) or the event does not happen (0).  So given some feature x it tries to find out whether some event y happens or not. In the case where the event happens, y is given the value 1. If the event does not happen, then y is given the value of 0.  For example, if y represents whether a sports team wins a match, then y will be 1 if they win the match or y will be 0 if they do not. Logistic Regression
  • 35.
  • 36.  Decision Trees (DTs) are a non- parametric supervised learning method used for classification and regression.  The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. Decision Tree
  • 37.
  • 38.  A Support Vector Machine (SVM) is a supervised machine learning algorithm that can be employed for both classification and regression purposes.  SVMs are based on the idea of finding a hyperplane that best divides a dataset into two classes. Hyperplane is a line or a surface that linearly separates and classifies a set of data.  Support vectors are the data points nearest to the hyperplane. These are points of a data set that, if removed, would alter the position of the dividing hyperplane. Because of this, they can be considered the critical elements of a data set.  The distance between the hyperplane and the nearest data point from either set is known as the margin. The goal is to choose a hyperplane with the greatest possible margin between the hyperplane and any point within the training set, giving a greater chance of new data being SVM
  • 39.  Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable. Naive Bayes
  • 40. Bayes Theorem P(H|E) = (P(E|H) * P(H)) / P(E) where •P(H|E) is the probability of hypothesis H given the event E, a posterior probability. •P(E|H) is the probability of event E given that the hypothesis H is true. •P(H) is the probability of hypothesis H being true (regardless of any related event), or prior probability of H. •P(E) is the probability of the event occurring (regardless of the hypothesis). This is the Bayes Theorem.
  • 41.  K Nearest Neighbor(KNN) is a very simple, easy to understand, versatile and one of the topmost machine learning algorithms.  KNN is used in the variety of applications such as finance, healthcare, political science, handwriting detection, image recognition and video recognition. In Credit ratings, financial institutes will predict the credit rating of customers. In loan disbursement, banking institutes will predict whether the loan is safe or risky. In political science, classifying potential voters in two classes will vote or won’t vote.  KNN algorithm used for both classification and regression problems.  Based on feature similarity approach. K - NN
  • 42.
  • 43.  K-means clustering is one of the most widely used unsupervised machine learning algorithms that forms clusters of data based on the similarity between data instances. For this particular algorithm to work, the number of clusters has to be defined beforehand. The K in the K-means refers to the number of clusters.  The K-means algorithm starts by randomly choosing a centroid value for each cluster. After that the algorithm iteratively performs three steps: (i) Find the Euclidean distance between each data instance and centroids of all the clusters; (ii) Assign the data instances to the cluster of the centroid with nearest distance; (iii) Calculate new centroid values based on the mean values of the coordinates of all the data instances from the corresponding cluster. K-Means
  • 44.  Random forest is a type of supervised machine learning algorithm based on ensemble learning.  Ensemble learning is a type of learning where you join different types of algorithms or same algorithm multiple times to form a more powerful prediction model.  The random forest algorithm combines multiple algorithm of the same type i.e. multiple decision trees, resulting in a forest of trees, hence the name "Random Forest". The random forest algorithm can be used for both regression and classification tasks. Random Forest
  • 45.  Pick N random records from the dataset.  Build a decision tree based on these N records.  Choose the number of trees you want in your algorithm and repeat steps 1 and 2.  In case of a regression problem, for a new record, each tree in the forest predicts a value for Y (output). The final value can be calculated by taking the average of all the values predicted by all the trees in forest.  Or, in case of a classification problem, each tree in the forest predicts the category to which the new record belongs. Finally, the new record is assigned to the category that wins the majority vote How the Random Forest Algorithm Works
  • 46.  Neural Networks are a machine learning framework that attempts to mimic the learning pattern of natural biological neural networks. Biological neural networks have interconnected neurons with dendrites that receive inputs, then based on these inputs they produce an output signal through an axon to another neuron. We will try to mimic this process through the use of Artificial Neural Networks (ANN)  The process of creating a neural network begins with the most basic form, a single perceptron. Neural Networks
  • 47. Perceptron – An Artificial Neuron
  • 48. y = f b+ wixi i=1 n-1 å æ è ç ö ø ÷ x1 x2 x3 b y w1 w3w2 What is an Artificial Neuron?  An Artificial Neuron (AN) is a non-linear parameterized function with restricted output range
  • 50. Deep Feed Forward Neural Nets So what then is learning? hθ(x(i)) hypothesis (x(i),y(i)) Forward Propagation Learning is the adjusting of the weights wi,j such that the cost function J(θ) is minimized (a form of Hebbian learning). Simple learning procedure: Back Propagation (of the error signal)
  • 52. Applications  Recognizing patterns:  Facial identities or facial expressions  Handwritten or spoken words  Medical images  Generating patterns:  Generating images or motion sequences  Recognizing anomalies:  Unusual sequences of credit card transactions  Unusual patterns of sensor readings in a nuclear power plant or unusual sound in your car engine.  Prediction:  Future stock prices or currency exchange rates
  • 53. Applications  Spam filtering, fraud detection:  Recommendation systems:  Information retrieval:  Find documents or images with similar content.  Data Visualization:  Display a huge database in a revealing way  Facial recognition for Face ID, Facebook automatic tagging, etc. (CNN)  Scene and image description for low-sighted people. (CNN, LSTM)  Traffic sign classification for self driving cars. (CNN)  Sentiment analysis to detect hateful speech on Twitter/Instagram. (LSTM)  Automated game playing to… play games. (Deep Q-Learning)  Image style transfer for prismAI, image colorization for old photographs. (CNN)
  • 54. Hand Written Digit Recognition
  • 59. Displaying the structure of a set of documents using a deep neural network
  • 60.
  • 61. When Would We Use Machine Learning?  When patterns exists in our data  Especially when we don’t know what they are  We can not pin down the functional relationships mathematically  Else we would just code up the algorithm  When we have lots of (unlabeled) data  Labeled training sets harder to come by  Data is of high-dimension  High dimension “features”  For example, sensor data  Want to “discover” lower-dimension representations  Dimension reduction  Aside: Machine Learning is heavily focused on implementability  Frequently using well know numerical optimization techniques  Lots of open source code available  See e.g., libsvm (Support Vector Machines): http://www.csie.ntu.edu.tw/~cjlin/libsvm/  Most of my code in python: http://scikit-learn.org/stable/ (many others)  Languages (e.g., octave: https://www.gnu.org/software/octave/)
  • 62.  Python Machine Learning by Example, Yuxi Hayden Liu  Applied Machine Learning, Lecture 10: Introduction to unsupervised and semi-supervised learning, Richard Johnson  Building Machine Learning Systems with Python, Luis Pedro Coelho  deeplearning.ai  https://www.coursera.org/learn/machine- learning#syllabus  https://chrisalbon.com/#machine_learning  https://medium.com/machine-learning-for- humans/how-to-learn-machine-learning- 24d53bb64aa1 Further Learning Resources
  • 63. We had a simple overview of some techniques and algorithms in machine learning. There are many more techniques that apply machine learning as a solution. Machine Learning is New ELECTRICITY. Conclusion