Learning Machine Learning (SACON May 2018)

SACON
SACON Pune 2018
India | Pune | May 18 – 19 | Hotel Hyatt Pune
Learning Machine Learning
Subrat Panda
Capillary Technologies
Principal Architect, AI and Data Sciences

SACON
LEARNING MACHINE LEARNING
Subrat Panda
Principal Architect, AI and Data Sciences,
Capillary Technologies (www.capillarytech.com)
Co-Founder : IDLI (Indian Deep Learning Initiative)
https://www.facebook.com/groups/idliai/
BTech(2002), PhD(2009) IIT KGP.
https://www.linkedin.com/in/subratpanda/
Email : subratpanda@gmail.com
Acknowledgements:
Biswa Gourav Singh
Co-Founder : IDLI (Indian Deep Learning Initiative)
https://www.linkedin.com/in/biswagsingh/
Email: biswagourav.singh@gmail.com
AI Community Across the Globe

SACON
Preface
• Artificial intelligence is already part of our everyday lives.
SACON 2018 - Pune

SACON
Application of AI, Machine Learning and Deep
Learning

SACON
Gartner Says By 2020,
Artificial Intelligence Will
Create More Jobs Than It
Eliminates

SACON
What this talk can motivate people to do
§ STUDENTS:
§ Motivates to participate in data science competitions
§ Further learning and add the expertise to the resume
§ Final year and fun projects.
§ PROFESSIONALS:
§ Find interesting data in your current project and apply machine learning
§ Motivates further learning and profession change. Data scientists/Machine
learning engineers are highly paid professionals J
§ TEACHERS:
§ Motivates teachers to spread knowledge in the their university
§ Conduct hackathons
SACON 2018 - Pune

SACON
Machine Learning Classical Definition
§ Arthur Samuel (1959): "computer’s ability to learn without being
explicitly programmed.“
§ Tom M Mitchel (1998): "A computer program is said to learn from
experience E with respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured by P,
improves with experience E.“
§ Optimize a performance criterion using example data or past
experience.

SACON
Types of Machine Learning Algorithms
§ Supervised Learning: Input data with
labeled responses
§ Regression : Given a picture of a person, we
have to predict their age on the basis of the
given picture
§ Classification : Given a patient with a tumor,
we have to predict whether the tumor is
malignant or benign. IRIS DATASET
SPECIES
CLASSIFICATION
TEXT
CLASSIFICATION
IMAGE
CLASSIFICATION
Linear Regression Non-Linear
Regression

SACON
Types of Machine Learning Algorithms
§ Unsupervised Learning: Input data without labeled responses.
§ Clustering: Take a collection of 1,000,000 different genes, and find a way to
automatically group these genes into groups that are somehow similar or
related by different variables, such as lifespan, location, roles, and so on.
§ Non Clustering: Exploratory data analysis (PCA, Auto-encoders)
Customer
Segmentation
MNIST Digit Segmentation

SACON
Pop Quiz
§ Predicting housing prices based on input parameters like house
size, number of rooms, location of house etc. falls under which
category of machine learning problem:
§ A) Regression
§ B) Classification
§ C) Clustering
§ D) None
§ Automatically segmenting your customers according to the customer
information falls under which category of machine learning.
§ A) Regression
§ B) Classification
§ C) Clustering
§ D) None

SACON
SACON Pune 2018
Supervised Learning
SACON 2018 - Pune

SACON
Linear Regression
• Linear regression is the simple form of Supervised learning.
• In a regression problem the target variable is continuous.
Living Area (Sq. feet) Year Built Price (1000$s)
2104 2012 400
1600 2013 300
2400 2014 369
1416 2013 232
3000 2015 540
. . .
. . .
. . .
Predict Housing Price from Historical data

SACON
Linear Regression
• The goal is to learn a function which assumes linear relationship
between target variable Y with input variable X

SACON
Linear Regression
• In supervised learning, our goal is, given a training set, to learn a function h : X
→ Y so that h(x) is a “good” predictor for the corresponding value of Y.
Living Area (Sq.
feet) Year Built
Price
(1000$s)
2104 2012 400
1600 2013 300
2400 2014 369
1416 2013 232
3000 2015 540
. . .
. . .
. . .
• Lets consider the housing data above. X’s represents a two dimensional vector
ad Y represents the price of the house.

SACON
Cost Function I
• Lets approximate the Y as a linear function of X. Hence the hypothesis function
will be given by.
• θ’s are the parameters (also called weights) parameterizing the space of linear
functions mapping from X to Y.
• How do we pick, or learn, the parameters θ? One reasonable method seems to
be to make h(x) close to y, at least for the training examples we have. The cost
function is given by: (Considering θ1
• This is the least-squares cost function that gives rise to the ordinary least
squares regression model

SACON
Cost Function II
§ We want to choose θ so as to minimize J(θ).
§ We can see the cost associated with different values of θ and we can see the
graph has a slight bowl to its shape.
§ The goal is to “roll down the hill”, and find θ corresponding to the bottom of
the bowl.

SACON
Gradient Descent
§ We should use a search algorithm that starts with some “initial guess” for θ, and that
repeatedly changes θ to make J(θ) smaller, until we converge to a value of θ that
minimizes J(θ).
§ The algorithm we choose is Gradient Descent Algorithm, which starts with some
initial θ and repeatedly perform the following update:
§ If we calculate the partial derivate , we get the following output:
α = Learning Rate
If α is too small: slow convergence.
If α is too large: may not decrease on every iteration and thus may
not converge.

SACON
How the algorithm Works:
SACON 2018 - Pune
(θ0,θ1) = (-0.12, 820)

SACON
(θ0,θ1) = (0.0,
420)
(θ0,θ1) = (0.14, 220)
SACON 2018 - Pune

SACON
Other Optimization Methods:
§ There is an alternative to batch gradient descent that also works very well.
Consider the following algorithm:
§ Each time we encounter a training example, we update the parameters
according to the gradient of the error with respect to that single training
example only. This algorithm is called Stochastic Gradient Descent(SGD).
§ Other examples of Optimization algorithms: BFGS, L-BFGS
§ Mini batch gradient descent: performs an update for every batch.

SACON
Normal Equation
§ Normal Equation is a method to solve for θ analytically.
§ Our cost function looks like:
§ To minimize a Quadratic function, the partial derivative of the function should
be equated to zero.

SACON
Normal Equation
§ Given a training set with m examples and n features, define the
design matrix X to be the m-by-n matrix give like below:
§ Thus, the value of θ that minimizes J(θ) is given
in closed form by the equation
§ let y be the m-dimensional vector containing all the target values
from the training set:

SACON
Pop Quiz
• What is the effect of high learning rate on cost function :

SACON
Introduction
§ It is an approach to the classification problem.
§ The output vector is either 1 or 0 instead of a continuous range of
values
§ y ∈ {0,1}
§ Binary classification problem (two values)
§ Linear regression wont work in the classification problem
IMAGE
CLASSIFICATION

SACON
Logistic Regression: Hypothesis
§ The hypothesis should satisfy
§ 0 ≤ h(x) ≤ 1
§ the "Sigmoid Function," also called
the "Logistic Function":
§ We want to restrict the range to 0
and 1. This is accomplished by
plugging θTx into the Logistic
Function

SACON
Decision Boundary
In order to get our discrete 0 or 1 classification, we can translate the output of the
hypothesis function as follows:
hθ(x)≥0.5→y=1
hθ(x)<0.5→y=0

SACON
Cost Function
§ Can not use squared cost function as Logistic Function will cause the
output to be wavy, causing many local optima.

SACON
Cost Function
§ Logistic regression Cost function

SACON
Advanced Optimization
§ Gradient Descent
§ Conjugate Gradient
§ BFGS
§ L-BFGS

SACON
SVM: Support Vector Machine

SACON
Overview
§ Intro. to Support Vector Machines (SVM)
§ Properties of SVM
§ Applications
§ Discussion

SACON
§ A Support Vector Machine (SVM) is a supervised machine
learning algorithm that can be employed for both classification
and regression purposes.
§ SVMs are more commonly used in classification problems
Introduction
Plot shows size and weight of several
people, and there is also a way to
distinguish between men and women.

SACON
§ We can see that it is possible to separate the data into classes.
§ We could trace a line and then all the data points representing men will be
above the line, and all the data points representing women will be below the
line.
Separating Hyperplane

SACON
§ Many separating hyperplane possible. Which one is best?
What is the Optimal Separating Hyperplane

SACON
• We will try to select an hyperplane as far as possible from data
points from each category (best hyperplane)
• Because it correctly classifies the training data
• And because it is the one which will generalize better with unseen data
What is the Optimal Separating Hyperplane

SACON
• Given a particular hyperplane, we can compute the distance between the hyperplane and the
closest data point(Support Vectors).
• Basically the margin is a no man's land. There will never be any data point inside the margin.
Large Margin Classifier
The optimal hyperplane will be the one with the
biggest margin. Margin A is better than Margin B

SACON
How do we calculate this Margin?

SACON
How do we maximize this Margin?

SACON
Non-linear SVMs
n Datasets that are linearly separable with some
noise work out great:
n But what are we going to do if the dataset is just
too hard?
n How about… mapping data to a higher-
dimensional space:
0 x
0 x
0 x
x2

SACON
Non-linear SVMs: Feature spaces
n General idea: the original input space can always
be mapped to some higher-dimensional feature
space where the training set is separable:
Φ: x → φ(x)

SACON
The“Kernel Trick”
n The linear classifier relies on dot product between vectors K(xi,xj)=xi
Txj
n If every data point is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the dot product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
n A kernel function is some function that corresponds to an inner product in
some expanded feature space.
n Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xi
Txj)2
,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xi
Txj)2
,
= 1+ xi1
2xj1
2 + 2 xi1xj1 xi2xj2+ xi2
2xj2
2 + 2xi1xj1 + 2xi2xj2
= [1 xi1
2 √2 xi1xi2 xi2
2 √2xi1 √2xi2]T [1 xj1
2 √2 xj1xj2 xj2
2 √2xj1 √2xj2]
= φ(xi) Tφ(xj), where φ(x) = [1 x1
2 √2 x1x2 x2
2 √2x1 √2x2]

SACON
What Functions are Kernels?
n For some functions K(xi,xj) checking that
K(xi,xj)= φ(xi) Tφ(xj) can be cumbersome.
n Mercer’s theorem:
Every semi-positive definite symmetric function is a kernel
n Semi-positive definite symmetric functions correspond to a semi-positive
definite symmetric Gram matrix:
K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xN)
K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xN)
… … … … …
K(xN,x1) K(xN,x2) K(xN,x3) … K(xN,xN)
K=

SACON
Examples of Kernel Functions
n Linear: K(xi,xj)= xi
Txj
n Polynomial of power p: K(xi,xj)= (1+ xi
Txj)p
n Gaussian (radial-basis function network):
n Sigmoid: K(xi,xj)= tanh(β0xi
Txj + β1)
)
2
exp(),( 2
2
s
ji
ji
xx
xx
-
-=K

SACON
Non-linear SVMs Mathematically
n Dual problem formulation:
n The solution is:
n Optimization techniques for finding αi’s remain the same!
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
f(x) = ΣαiyiK(xi,
xj)+ b

SACON
n SVM locates a separating hyperplane in the feature space and classify points in
that space
n It does not need to represent the space explicitly, simply by defining a kernel
function
n The kernel function plays the role of the dot product in the feature space.
Nonlinear SVM - Overview

SACON
Properties of SVM
§ Flexibility in choosing a similarity function
§ Sparseness of solution when dealing with large data sets
- only support vectors are used to specify the separating hyperplane
§ Ability to handle large feature spaces
- complexity does not depend on the dimensionality of the feature space
§ Overfitting can be controlled by soft margin approach
§ Nice math property: a simple convex optimization problem which is
guaranteed to converge to a single global solution
§ Feature Selection

SACON
SVM Applications
§ SVM has been used successfully in many real-world problems
§ Text (and hypertext) categorization
§ Image classification
§ Bioinformatics (Protein classification, Cancer classification)
§ Hand-written character recognition

SACON
Application 1: Cancer Classification
§High Dimensional
§ - p>1000; n<100
§Imbalanced
§ - less positive samples
§Many irrelevant features
§Noisy
Genes
Patients g-1 g-2 …… g-p
P-1
p-2
…….
p-n
N
n
xxkxxK
+
+= l),(],[
FEATURE SELECTION
In the linear case,
wi
2 gives the ranking of dim i
SVM is sensitive to noisy (mis-labeled) data L

SACON
Weakness of SVM
§ It is sensitive to noise
- A relatively small number of mislabeled examples can dramatically decrease
the performance
§ It only considers two classes
- how to do multi-class classification with SVM?
- Answer:
1) with output arity m, learn m SVM’s
§ SVM 1 learns “Output==1” vs “Output != 1”
§ SVM 2 learns “Output==2” vs “Output != 2”
§ :
§ SVM m learns “Output==m” vs “Output != m”
§ 2)To predict the output for a new input, just predict with each SVM and
find out which one puts the prediction the furthest into the positive region.

SACON
Application 2: Text Categorization
§ Task: The classification of natural text (or hypertext) documents into
a fixed number of predefined categories based on their content.
- email filtering, web searching, sorting documents by topic, etc..
§ A document can be assigned to more than one category, so this can
be viewed as a series of binary classification problems, one for each
category

SACON
Representation of Text
IR’s vector space model (aka bag-of-words
representation)
n A doc is represented by a vector indexed by a pre-
fixed set or dictionary of terms
n Values of an entry can be binary or weights
n Normalization, stop words, word stems
n Doc x => φ(x)

SACON
Text Categorization using SVM
§ The distance between two documents is φ(x)·φ(z)
§ K(x,z) = 〈φ(x)·φ(z) is a valid kernel, SVM can be used with K(x,z) for
discrimination.
§ Why SVM?
§ High dimensional input space
§ Few irrelevant features (dense concept)
§ Sparse document vectors (sparse instances)
§ Text categorization problems are linearly separable

SACON
Some Issues
§ Choice of kernel
§ Gaussian or polynomial kernel is default
§ If ineffective, more elaborate kernels are needed
§ Domain experts can give assistance in formulating appropriate
similarity measures
§ Choice of kernel parameters
§ e.g. σ in Gaussian kernel
§ σ is the distance between closest points with different classifications
§ In the absence of reliable criteria, applications rely on the use of a
validation set or cross-validation to set such parameters.
§ Optimization criterion – Hard margin v.s. Soft margin
§ a lengthy series of experiments in which various parameters are tested

SACON
k-Nearest Neighbor Classification
(kNN)
§ Unlike all the previous learning methods, kNN does not build model
from the training data.
§ To classify a test instance d, define k-neighborhood P as k nearest
neighbors of d
§ Count number n of training instances in P that belong to class cj
§ Estimate Pr(cj|d) as n/k
§ No training is needed. Classification time is linear in training set size
for each test case.

SACON
kNN Algorithm
n k is usually chosen empirically via a validation set
or cross-validation by trying a range of k values.
n Distance function is crucial, but depends on
applications.

SACON
Example: k=6 (6NN)
Government
Science
Arts
A new point
Pr(science|
)?

SACON
Discussions
§ kNN can deal with complex and arbitrary decision boundaries.
§ Despite its simplicity, researchers have shown that the classification
accuracy of kNN can be quite strong and in many cases as accurate
as those elaborated methods.
§ kNN is slow at the classification time
§ kNN does not produce an understandable model

SACON
CLUSTERING
SACON 2018 - Pune

SACON
INTRODUCTION-
What is clustering?
§ Clustering is the classification of objects into different groups, or more
precisely, the partitioning of a data set into subsets (clusters), so that the data
in each subset (ideally) share some common trait - often according to some
defined distance measure.

SACON
TYPES OF CLUSTERING
§ Hierarchical algorithms: these find successive clusters using previously
established clusters.
§ Agglomerative ("bottom-up"): Agglomerative algorithms begin with each element as
a separate cluster and merge them into successively larger clusters.
§ Divisive ("top-down"): Divisive algorithms begin with the whole set and proceed to
divide it into successively smaller clusters.
SACON 2018 - Pune
CLUSTER
DENDOGRAM

SACON
TYPES OF CLUSTERING
§ Partitional clustering: Partitional algorithms determine all clusters at
once. They include:
§ K-means and derivatives
§ Fuzzy c-means clustering
§ QT clustering algorithm

SACON
TYPES OF CLUSTERING
§Distance measure will determine how the similarity of two
elements is calculated and it will influence the shape of the
clusters.
§ They include:
§The Euclidean distance (also called 2-norm distance) is given
by:
§The Manhattan distance (also called taxicab norm or 1-norm) is
given by:

SACON
§ The maximum norm is given by:
§ The Mahalanobis distance corrects data for different scales and
correlations in the variables.
§ Inner product space: The angle between two vectors can be used as a
distance measure when clustering high dimensional data
§ Hamming distance (sometimes edit distance) measures the minimum
number of substitutions required to change one member into another.

SACON
K-MEANS CLUSTERING
§The k-means algorithm is an algorithm to cluster n
objects based on attributes into k partitions, where k
< n.
§It is similar to the expectation-maximization algorithm
for mixtures of Gaussians in that they both attempt to
find the centers of natural clusters in the data.
§It assumes that the object attributes form a vector
space.

SACON
§ An algorithm for partitioning (or clustering) N data points into K disjoint
subsets Sj containing data points so as to minimize the sum-of-squares
criterion
where xn is a vector representing the the nth data point and uj is the
geometric centroid of the data points in Sj.
§ Simply speaking k-means clustering is an algorithm to categorize or to group
the objects based on attributes/features into K number of group.
§ K is positive integer number.
§ The grouping is done by minimizing the sum of squares of distances
between data and the corresponding cluster centroid.

SACON
HOW K-MEANS CLUSTERING WORKS?
§ Step 1: Begin with a decision on the value of k = Number of
clusters
§ Step 2: Put any initial partition that classifies the data into k
clusters. You may assign the training samples randomly, or
systematically as the following:
§ Take the first k training sample as single- element
clusters
§ Assign each of the remaining (N-k) training sample to the
cluster with the nearest centroid. After each assignment,
recompute the centroid of the gaining cluster.
§ Step 3: Take each sample in sequence and compute its distance
from the centroid of each of the clusters. If a sample is not
currently in the cluster with the closest centroid, switch this
sample to that cluster and update the centroid of the cluster
gaining the new sample and the cluster losing the sample.
§ Step 4 . Repeat step 3 until convergence is achieved, that is until a
pass through the training sample causes no new assignments.

SACON
SACON Pune 2018
Bias-Variance in Machine
Learning

SACON
§ Bias is the algorithm's tendency to
consistently learn the wrong thing by
not taking into account all the
information in the data
§ Variance is the algorithm's tendency to
learn random things irrespective of the
real signal by fitting highly flexible
models that follow the error/noise in
the data too closely
Bias/Variance

SACON
• Generalization ability gives an algorithm’s ability to give accurate
prediction new, previous unseen data
• Models that are too complex for the amount of training data
available are said to overfit and are not likely to generalize well to
new examples
• High variance can cause an algorithm to model the random noise in
the training data, rather than the intended outputs (overfitting).
• Models that are too simple, that do not even do well on training data,
are said to underfit and also not likely to generalize well.
• High bias can cause an algorithm to miss the relevant relations
between features and target outputs (underfitting).
Problem of high Bias/Variance

SACON
Bias-Variance: An Example

SACON
Bias/Variance is a Way to Understand
Overfitting and Underfitting
Error/Loss on
training set
Dtrain
Error/Loss on
an unseen test
set Dtest
high error
80
complex classifiersimple classifier
“too simple”
“too complex”

SACON
Definitions
• Overfitting: too much reliance on the training data
• Underfitting: a failure to learn the relationships in the training data
• High Variance: model changes significantly based on training data
• High Bias: assumptions about model lead to ignoring training data
• Overfitting and underfitting cause poor generalization on the test set
• A validation set for model tuning can prevent under and overfitting
SACON 2018 - Pune

SACON
Ways to Deal with
Overfitting and Underfitting
§ Underfitting:
§ Easier to resolve
§ Try different machine learning models
§ Try stronger models with higher capacity (hyperparameter
tuning)
§ Try more features
§ Overfitting
§ Use a resampling technique like K-fold cross validation
§ Improve the feature quality or remove some features
§ Training with more data
§ Early stopping
§ Regularization
§ Ensembling
Early Stopping

SACON
Regularization
• Regularization penalizes the coefficients. In machine learning, it
actually penalizes the weight matrices of the nodes.
• L1 and L2 are the most common types of regularization.
• These update the general cost function by adding another term
known as the regularization term.
Cost function = Loss (say, binary cross entropy) + Regularization
term

SACON
L1 and L2 Regularization
§ In L2, we have:
§ Here, lambda is the regularization parameter. It is the hyperparameter whose
value is optimized for better results. L2 regularization is also known as weight
decay as it forces the weights to decay towards zero (but not exactly zero).
§ In L1, we have:
§ In this, we penalize the absolute value of the weights. Unlike L2, the weights may
be reduced to zero here.

SACON
Neural Networks in Machine
Learning

SACON
Artificial Neural Networks
§ A Single Neuron: The basic unit of computation in a neural network is
the neuron, often called a node or unit.
§ The function f is non-linear and is called the Activation Function.
§ The idea of ANNs is based on the belief that working of human brain by making
the right connections, can be imitated using silicon and wires as
living neurons and dendrites.

SACON
Activation Function
§ Sigmoid: takes a real-valued input and squashes it to range between 0 and 1.
σ(x) = 1 / (1 + exp(−x))
§ tanh: takes a real-valued input and squashes it to the range [-1, 1]
tanh(x) = 2σ(2x) − 1
§ ReLU: ReLU stands for Rectified Linear Unit. It takes a real-valued input and
thresholds it at zero (replaces negative values with zero)
f(x) = max(0, x)

SACON
Neural Network Intuition (single layer)
SACON 2018 - Pune

SACON
Neural Network Intuition (Multiple Layer layer)
§ Multi Layer Neural network is capable of learning complex
functions.
§ Lets consider XNOR operation.
• CASE1: X1 XNOR X2 = (A’.B’) + (A.B)
NN
representation
• CASE2: X1 XNOR X2 = NOT [ (A+B).(A’+B’) ]
NN representation = ?

SACON
Back-Propagation
§ Back-propagation (BP) algorithms works by
determining the loss (or error) at the output and then
propagating it back into the network.
§ The weights are updated to minimize the error
resulting from each neuron.

SACON
Regularization: Dropout
§ At every iteration, it randomly selects some nodes
and removes them along with all of their incoming
and outgoing connections
§ We need to choose the dropout parameter such
that we get the appropriate fitting

SACON
Deep Learning
• Deep Neural Network has a been very successful recently in the field
of computer vision, Natural language Processing, Speech recognition
and many more.
• Some of the important/successful networks are
• Convolutional Neural Network: Has been very successful in computer vision
• Recurrent neural network: Has been successful in Natural Language
Processing and speech recognition as well.

SACON
Decision Tree
§ Decision Tree is the supervised learning algorithm.
§ We split the population or sample into two or more homogeneous sets (or sub-
populations) based on most significant differentiator in input variables.
1.Root Node: It represents entire
population or sample and this further gets
divided into two or more homogeneous
sets.
2.Splitting: It is a process of dividing a
node into two or more sub-nodes.
3.Decision Node: When a sub-node splits
into further sub-nodes, then it is called
decision node.
4.Leaf/ Terminal Node: Nodes do not
split is called Leaf or Terminal node.

SACON
Methods of splitting: Information gain
which node can be described easily?
§ Information theory is a measure to define this degree of disorganization in a system known as
Entropy.
Here p and q is probability of success and failure respectively in that
node.

SACON
Other Tree based methods
§ Trade-off management of bias-variance errors.
§ Bagging is a simple ensembling technique in which we
build many independent predictors/models/learners and
combine them using some model averaging techniques.
§ Ensemble methods involve group of predictive models to
achieve a better accuracy and model stability.
§ Random Forest: Multiple Trees instead of
single tree. It’s a bagging method
§ To classify a new object based on
attributes, each tree gives a classification
and we say the tree “votes” for that class.

SACON
Other Tree based methods
§ Gradient Boosting is a tree ensemble technique that creates a strong classifier
from a number of weak classifiers.
§ It works in the technique of weak learners and the additive model.
§ Boosting is an ensemble technique in which the predictors are not made
independently, but sequentially.

SACON
Iris Dataset
§ Three species of Iris (Iris setosa, Iris virginica and Iris versicolor).
§ Four features were measured from each sample: the length and the width of
the sepals and petals, in centimeters.

SACON
References
• Andrew Ng’s Coursera Course
• Scikit Learn Training example on Google
• Nvidia
• Sebastian Ruder’s blog
• HBR
• MIT Tech Review
• Lots of Others
• AI community in general
• IDLI Community

Learning Machine Learning (SACON May 2018)

Recommandé

Recommandé

Contenu connexe

Similaire à Learning Machine Learning (SACON May 2018)

Similaire à Learning Machine Learning (SACON May 2018) (20)

Plus de Priyanka Aash

Plus de Priyanka Aash (20)

Dernier

Dernier (20)

Learning Machine Learning (SACON May 2018)