SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 1/14
A Tour of The Top 10 Algorithms for
Machine Learning Newbies
In machine learning, there’s something called the “No Free Lunch”
theorem. In a nutshell, it states that no one algorithm works best for
every problem, and it’s especially relevant for supervised learning (i.e.
predictive modeling).
For example, you can’t say that neural networks are always better than
decision trees or vice-versa. There are many factors at play, such as the
size and structure of your dataset.
As a result, you should try many di erent algorithms for your problem,
while using a hold-out “test set” of data to evaluate performance and
select the winner.
James Le Follow
Jan 20, 2018 · 11 min read
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 2/14
Of course, the algorithms you try must be appropriate for your
problem, which is where picking the right machine learning task comes
in. As an analogy, if you need to clean your house, you might use a
vacuum, a broom, or a mop, but you wouldn’t bust out a shovel and
start digging.
The Big Principle
However, there is a common principle that underlies all supervised
machine learning algorithms for predictive modeling.
Machine learning algorithms are described as learning a target function
(f) that best maps input variables (X) to an output variable (Y): Y = f(X)
This is a general learning task where we would like to make predictions
in the future (Y) given new examples of input variables (X). We don’t
know what the function (f) looks like or its form. If we did, we would
use it directly and we would not need to learn it from data using
machine learning algorithms.
The most common type of machine learning is to learn the mapping Y
= f(X) to make predictions of Y for new X. This is called predictive
modeling or predictive analytics and our goal is to make the most
accurate predictions possible.
For machine learning newbies who are eager to understand the basic of
machine learning, here is a quick tour on the top 10 machine learning
algorithms used by data scientists.
1—Linear Regression
Linear regression is perhaps one of the most well-known and well-
understood algorithms in statistics and machine learning.
Predictive modeling is primarily concerned with minimizing the error
of a model or making the most accurate predictions possible, at the
expense of explainability. We will borrow, reuse and steal algorithms
from many di erent elds, including statistics and use them towards
these ends.
The representation of linear regression is an equation that describes a
line that best ts the relationship between the input variables (x) and
the output variables (y), by nding speci c weightings for the input
variables called coe cients (B).
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 3/14
For example: y = B0 + B1 * x
We will predict y given the input x and the goal of the linear regression
learning algorithm is to nd the values for the coe cients B0 and B1.
Di erent techniques can be used to learn the linear regression model
from data, such as a linear algebra solution for ordinary least squares
and gradient descent optimization.
Linear regression has been around for more than 200 years and has
been extensively studied. Some good rules of thumb when using this
technique are to remove variables that are very similar (correlated) and
to remove noise from your data, if possible. It is a fast and simple
technique and good rst algorithm to try.
2—Logistic Regression
Logistic regression is another technique borrowed by machine learning
from the eld of statistics. It is the go-to method for binary
classi cation problems (problems with two class values).
Logistic regression is like linear regression in that the goal is to nd the
values for the coe cients that weight each input variable. Unlike linear
regression, the prediction for the output is transformed using a non-
linear function called the logistic function.
The logistic function looks like a big S and will transform any value into
the range 0 to 1. This is useful because we can apply a rule to the
output of the logistic function to snap values to 0 and 1 (e.g. IF less
than 0.5 then output 1) and predict a class value.
Linear Regression
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 4/14
Because of the way that the model is learned, the predictions made by
logistic regression can also be used as the probability of a given data
instance belonging to class 0 or class 1. This can be useful for problems
where you need to give more rationale for a prediction.
Like linear regression, logistic regression does work better when you
remove attributes that are unrelated to the output variable as well as
attributes that are very similar (correlated) to each other. It’s a fast
model to learn and e ective on binary classi cation problems.
3—Linear Discriminant Analysis
Logistic Regression is a classi cation algorithm traditionally limited to
only two-class classi cation problems. If you have more than two
classes then the Linear Discriminant Analysis algorithm is the preferred
linear classi cation technique.
The representation of LDA is pretty straight forward. It consists of
statistical properties of your data, calculated for each class. For a single
input variable this includes:
The mean value for each class.
The variance calculated across all classes.
1.
2.
Logistic Regression
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 5/14
Predictions are made by calculating a discriminate value for each class
and making a prediction for the class with the largest value. The
technique assumes that the data has a Gaussian distribution (bell
curve), so it is a good idea to remove outliers from your data before
hand. It’s a simple and powerful method for classi cation predictive
modeling problems.
4—Classi cation and Regression Trees
Decision Trees are an important type of algorithm for predictive
modeling machinelearning.
The representation of the decision tree model is a binary tree. This is
your binary tree from algorithms and data structures, nothing too
fancy. Each node represents a single input variable (x) and a split point
on that variable (assuming the variable is numeric).
Linear Discriminant Analysis
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 6/14
The leaf nodes of the tree contain an output variable (y) which is used
to make a prediction. Predictions are made by walking the splits of the
tree until arriving at a leaf node and output the class value at that leaf
node.
Trees are fast to learn and very fast for making predictions. They are
also often accurate for a broad range of problems and do not require
any special preparation for your data.
5—Naive Bayes
Naive Bayes is a simple but surprisingly powerful algorithm for
predictive modeling.
The model is comprised of two types of probabilities that can be
calculated directly from your training data: 1) The probability of each
class; and 2) The conditional probability for each class given each x
value. Once calculated, the probability model can be used to make
predictions for new data using Bayes Theorem. When your data is real-
valued it is common to assume a Gaussian distribution (bell curve) so
that you can easily estimate these probabilities.
Decision Tree
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 7/14
Naive Bayes is called naive because it assumes that each input variable
is independent. This is a strong assumption and unrealistic for real
data, nevertheless, the technique is very e ective on a large range of
complex problems.
6—K-Nearest Neighbors
The KNN algorithm is very simple and very e ective. The model
representation for KNN is the entire training dataset. Simple right?
Predictions are made for a new data point by searching through the
entire training set for the K most similar instances (the neighbors) and
summarizing the output variable for those K instances. For regression
problems, this might be the mean output variable, for classi cation
problems this might be the mode (or most common) class value.
The trick is in how to determine the similarity between the data
instances. The simplest technique if your attributes are all of the same
scale (all in inches for example) is to use the Euclidean distance, a
number you can calculate directly based on the di erences between
each input variable.
Bayes Theorem
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 8/14
KNN can require a lot of memory or space to store all of the data, but
only performs a calculation (or learn) when a prediction is needed, just
in time. You can also update and curate your training instances over
time to keep predictions accurate.
The idea of distance or closeness can break down in very high
dimensions (lots of input variables) which can negatively a ect the
performance of the algorithm on your problem. This is called the curse
of dimensionality. It suggests you only use those input variables that
are most relevant to predicting the output variable.
7—Learning Vector Quantization
A downside of K-Nearest Neighbors is that you need to hang on to your
entire training dataset. The Learning Vector Quantization algorithm (or
LVQ for short) is an arti cial neural network algorithm that allows you
to choose how many training instances to hang onto and learns exactly
what those instances should look like.
K-Nearest Neighbors
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 9/14
The representation for LVQ is a collection of codebook vectors. These
are selected randomly in the beginning and adapted to best summarize
the training dataset over a number of iterations of the learning
algorithm. After learned, the codebook vectors can be used to make
predictions just like K-Nearest Neighbors. The most similar neighbor
(best matching codebook vector) is found by calculating the distance
between each codebook vector and the new data instance. The class
value or (real value in the case of regression) for the best matching unit
is then returned as the prediction. Best results are achieved if you
rescale your data to have the same range, such as between 0 and 1.
If you discover that KNN gives good results on your dataset try using
LVQ to reduce the memory requirements of storing the entire training
dataset.
8—Support Vector Machines
Support Vector Machines are perhaps one of the most popular and
talked about machine learning algorithms.
A hyperplane is a line that splits the input variable space. In SVM, a
hyperplane is selected to best separate the points in the input variable
space by their class, either class 0 or class 1. In two-dimensions, you can
visualize this as a line and let’s assume that all of our input points can
be completely separated by this line. The SVM learning algorithm nds
the coe cients that results in the best separation of the classes by the
hyperplane.
Learning Vector Quantization
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 10/14
The distance between the hyperplane and the closest data points is
referred to as the margin. The best or optimal hyperplane that can
separate the two classes is the line that has the largest margin. Only
these points are relevant in de ning the hyperplane and in the
construction of the classi er. These points are called the support
vectors. They support or de ne the hyperplane. In practice, an
optimization algorithm is used to nd the values for the coe cients
that maximizes the margin.
SVM might be one of the most powerful out-of-the-box classi ers and
worth trying on your dataset.
9—Bagging and Random Forest
Random Forest is one of the most popular and most powerful machine
learning algorithms. It is a type of ensemble machine learning
algorithm called Bootstrap Aggregation or bagging.
The bootstrap is a powerful statistical method for estimating a quantity
from a data sample. Such as a mean. You take lots of samples of your
data, calculate the mean, then average all of your mean values to give
you a better estimation of the true mean value.
In bagging, the same approach is used, but instead for estimating entire
statistical models, most commonly decision trees. Multiple samples of
your training data are taken then models are constructed for each data
sample. When you need to make a prediction for new data, each model
makes a prediction and the predictions are averaged to give a better
estimate of the true output value.
Support Vector Machine
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 11/14
Random forest is a tweak on this approach where decision trees are
created so that rather than selecting optimal split points, suboptimal
splits are made by introducing randomness.
The models created for each sample of the data are therefore more
di erent than they otherwise would be, but still accurate in their
unique and di erent ways. Combining their predictions results in a
better estimate of the true underlying output value.
If you get good results with an algorithm with high variance (like
decision trees), you can often get better results by bagging that
algorithm.
10—Boosting and AdaBoost
Boosting is an ensemble technique that attempts to create a strong
classi er from a number of weak classi ers. This is done by building a
model from the training data, then creating a second model that
attempts to correct the errors from the rst model. Models are added
until the training set is predicted perfectly or a maximum number of
models are added.
AdaBoost was the rst really successful boosting algorithm developed
for binary classi cation. It is the best starting point for understanding
boosting. Modern boosting methods build on AdaBoost, most notably
stochastic gradient boosting machines.
Random Forest
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 12/14
AdaBoost is used with short decision trees. After the rst tree is created,
the performance of the tree on each training instance is used to weight
how much attention the next tree that is created should pay attention
to each training instance. Training data that is hard to predict is given
more weight, whereas easy to predict instances are given less weight.
Models are created sequentially one after the other, each updating the
weights on the training instances that a ect the learning performed by
the next tree in the sequence. After all the trees are built, predictions
are made for new data, and the performance of each tree is weighted
by how accurate it was on training data.
Because so much attention is put on correcting mistakes by the
algorithm it is important that you have clean data with outliers
removed.
Last Takeaway
A typical question asked by a beginner, when facing a wide variety of
machine learning algorithms, is “which algorithm should I use?” The
answer to the question varies depending on many factors, including:
(1) The size, quality, and nature of data; (2) The available
computational time; (3) The urgency of the task; and (4) What you
want to do with the data.
Even an experienced data scientist cannot tell which algorithm will
perform the best before trying di erent algorithms. Although there are
many other Machine Learning algorithms, these are the most popular
ones. If you’re a newbie to Machine Learning, these would be a good
starting point to learn.
— —
AdaBoost
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 13/14
If you enjoyed this piece, I’d love it if you hit the clap button 👏 so others
might stumble upon it. You can nd my own code on GitHub, and more of
my writing and projects at https://jameskle.com/. You can also follow me
on Twitter, email me directly or nd me on LinkedIn. Sign up for my
newsletter to receive my latest thoughts on data science, machine learning,
and arti cial intelligence right at your inbox!
1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 14/14

Contenu connexe

Tendances

Nimrita koul Machine Learning
Nimrita koul  Machine LearningNimrita koul  Machine Learning
Nimrita koul Machine LearningNimrita Koul
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONijaia
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONA MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONijaia
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Usama Fayyaz
 
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB An Application of Assignment Problem in Laptop Selection Problem Using MATLAB
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB mathsjournal
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithmsShalitha Suranga
 
Predire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big DataPredire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big DataData Driven Innovation
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysisClaireWhittaker5
 
Linear Regression, Machine learning term
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning termS Rulez
 
Machine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMachine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMario Cartia
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Gans - Generative Adversarial Nets
Gans - Generative Adversarial NetsGans - Generative Adversarial Nets
Gans - Generative Adversarial NetsSajalRastogi8
 
Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Galit Shmueli
 
A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...Aboul Ella Hassanien
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 

Tendances (18)

Nimrita koul Machine Learning
Nimrita koul  Machine LearningNimrita koul  Machine Learning
Nimrita koul Machine Learning
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONA MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB An Application of Assignment Problem in Laptop Selection Problem Using MATLAB
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithms
 
Predire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big DataPredire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big Data
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
Linear Regression, Machine learning term
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning term
 
Machine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMachine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By Examples
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Gans - Generative Adversarial Nets
Gans - Generative Adversarial NetsGans - Generative Adversarial Nets
Gans - Generative Adversarial Nets
 
Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)
 
A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 

Similaire à A tour of the top 10 algorithms for machine learning newbies

2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2uetian12
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to RAnshik Bansal
 
Regression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelRegression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelDr. Abdul Ahad Abro
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfAaryanArora10
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
Machine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdfMachine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdfPhD Assistance
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answerskavinilavuG
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTIRJET Journal
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018Codemotion
 

Similaire à A tour of the top 10 algorithms for machine learning newbies (20)

Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 
Regression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelRegression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms Excel
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Machine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdfMachine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdf
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdfTop Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
 
ML.pdf
ML.pdfML.pdf
ML.pdf
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLT
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
 

Dernier

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 

Dernier (20)

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 

A tour of the top 10 algorithms for machine learning newbies

  • 1. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 1/14 A Tour of The Top 10 Algorithms for Machine Learning Newbies In machine learning, there’s something called the “No Free Lunch” theorem. In a nutshell, it states that no one algorithm works best for every problem, and it’s especially relevant for supervised learning (i.e. predictive modeling). For example, you can’t say that neural networks are always better than decision trees or vice-versa. There are many factors at play, such as the size and structure of your dataset. As a result, you should try many di erent algorithms for your problem, while using a hold-out “test set” of data to evaluate performance and select the winner. James Le Follow Jan 20, 2018 · 11 min read
  • 2. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 2/14 Of course, the algorithms you try must be appropriate for your problem, which is where picking the right machine learning task comes in. As an analogy, if you need to clean your house, you might use a vacuum, a broom, or a mop, but you wouldn’t bust out a shovel and start digging. The Big Principle However, there is a common principle that underlies all supervised machine learning algorithms for predictive modeling. Machine learning algorithms are described as learning a target function (f) that best maps input variables (X) to an output variable (Y): Y = f(X) This is a general learning task where we would like to make predictions in the future (Y) given new examples of input variables (X). We don’t know what the function (f) looks like or its form. If we did, we would use it directly and we would not need to learn it from data using machine learning algorithms. The most common type of machine learning is to learn the mapping Y = f(X) to make predictions of Y for new X. This is called predictive modeling or predictive analytics and our goal is to make the most accurate predictions possible. For machine learning newbies who are eager to understand the basic of machine learning, here is a quick tour on the top 10 machine learning algorithms used by data scientists. 1—Linear Regression Linear regression is perhaps one of the most well-known and well- understood algorithms in statistics and machine learning. Predictive modeling is primarily concerned with minimizing the error of a model or making the most accurate predictions possible, at the expense of explainability. We will borrow, reuse and steal algorithms from many di erent elds, including statistics and use them towards these ends. The representation of linear regression is an equation that describes a line that best ts the relationship between the input variables (x) and the output variables (y), by nding speci c weightings for the input variables called coe cients (B).
  • 3. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 3/14 For example: y = B0 + B1 * x We will predict y given the input x and the goal of the linear regression learning algorithm is to nd the values for the coe cients B0 and B1. Di erent techniques can be used to learn the linear regression model from data, such as a linear algebra solution for ordinary least squares and gradient descent optimization. Linear regression has been around for more than 200 years and has been extensively studied. Some good rules of thumb when using this technique are to remove variables that are very similar (correlated) and to remove noise from your data, if possible. It is a fast and simple technique and good rst algorithm to try. 2—Logistic Regression Logistic regression is another technique borrowed by machine learning from the eld of statistics. It is the go-to method for binary classi cation problems (problems with two class values). Logistic regression is like linear regression in that the goal is to nd the values for the coe cients that weight each input variable. Unlike linear regression, the prediction for the output is transformed using a non- linear function called the logistic function. The logistic function looks like a big S and will transform any value into the range 0 to 1. This is useful because we can apply a rule to the output of the logistic function to snap values to 0 and 1 (e.g. IF less than 0.5 then output 1) and predict a class value. Linear Regression
  • 4. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 4/14 Because of the way that the model is learned, the predictions made by logistic regression can also be used as the probability of a given data instance belonging to class 0 or class 1. This can be useful for problems where you need to give more rationale for a prediction. Like linear regression, logistic regression does work better when you remove attributes that are unrelated to the output variable as well as attributes that are very similar (correlated) to each other. It’s a fast model to learn and e ective on binary classi cation problems. 3—Linear Discriminant Analysis Logistic Regression is a classi cation algorithm traditionally limited to only two-class classi cation problems. If you have more than two classes then the Linear Discriminant Analysis algorithm is the preferred linear classi cation technique. The representation of LDA is pretty straight forward. It consists of statistical properties of your data, calculated for each class. For a single input variable this includes: The mean value for each class. The variance calculated across all classes. 1. 2. Logistic Regression
  • 5. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 5/14 Predictions are made by calculating a discriminate value for each class and making a prediction for the class with the largest value. The technique assumes that the data has a Gaussian distribution (bell curve), so it is a good idea to remove outliers from your data before hand. It’s a simple and powerful method for classi cation predictive modeling problems. 4—Classi cation and Regression Trees Decision Trees are an important type of algorithm for predictive modeling machinelearning. The representation of the decision tree model is a binary tree. This is your binary tree from algorithms and data structures, nothing too fancy. Each node represents a single input variable (x) and a split point on that variable (assuming the variable is numeric). Linear Discriminant Analysis
  • 6. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 6/14 The leaf nodes of the tree contain an output variable (y) which is used to make a prediction. Predictions are made by walking the splits of the tree until arriving at a leaf node and output the class value at that leaf node. Trees are fast to learn and very fast for making predictions. They are also often accurate for a broad range of problems and do not require any special preparation for your data. 5—Naive Bayes Naive Bayes is a simple but surprisingly powerful algorithm for predictive modeling. The model is comprised of two types of probabilities that can be calculated directly from your training data: 1) The probability of each class; and 2) The conditional probability for each class given each x value. Once calculated, the probability model can be used to make predictions for new data using Bayes Theorem. When your data is real- valued it is common to assume a Gaussian distribution (bell curve) so that you can easily estimate these probabilities. Decision Tree
  • 7. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 7/14 Naive Bayes is called naive because it assumes that each input variable is independent. This is a strong assumption and unrealistic for real data, nevertheless, the technique is very e ective on a large range of complex problems. 6—K-Nearest Neighbors The KNN algorithm is very simple and very e ective. The model representation for KNN is the entire training dataset. Simple right? Predictions are made for a new data point by searching through the entire training set for the K most similar instances (the neighbors) and summarizing the output variable for those K instances. For regression problems, this might be the mean output variable, for classi cation problems this might be the mode (or most common) class value. The trick is in how to determine the similarity between the data instances. The simplest technique if your attributes are all of the same scale (all in inches for example) is to use the Euclidean distance, a number you can calculate directly based on the di erences between each input variable. Bayes Theorem
  • 8. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 8/14 KNN can require a lot of memory or space to store all of the data, but only performs a calculation (or learn) when a prediction is needed, just in time. You can also update and curate your training instances over time to keep predictions accurate. The idea of distance or closeness can break down in very high dimensions (lots of input variables) which can negatively a ect the performance of the algorithm on your problem. This is called the curse of dimensionality. It suggests you only use those input variables that are most relevant to predicting the output variable. 7—Learning Vector Quantization A downside of K-Nearest Neighbors is that you need to hang on to your entire training dataset. The Learning Vector Quantization algorithm (or LVQ for short) is an arti cial neural network algorithm that allows you to choose how many training instances to hang onto and learns exactly what those instances should look like. K-Nearest Neighbors
  • 9. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 9/14 The representation for LVQ is a collection of codebook vectors. These are selected randomly in the beginning and adapted to best summarize the training dataset over a number of iterations of the learning algorithm. After learned, the codebook vectors can be used to make predictions just like K-Nearest Neighbors. The most similar neighbor (best matching codebook vector) is found by calculating the distance between each codebook vector and the new data instance. The class value or (real value in the case of regression) for the best matching unit is then returned as the prediction. Best results are achieved if you rescale your data to have the same range, such as between 0 and 1. If you discover that KNN gives good results on your dataset try using LVQ to reduce the memory requirements of storing the entire training dataset. 8—Support Vector Machines Support Vector Machines are perhaps one of the most popular and talked about machine learning algorithms. A hyperplane is a line that splits the input variable space. In SVM, a hyperplane is selected to best separate the points in the input variable space by their class, either class 0 or class 1. In two-dimensions, you can visualize this as a line and let’s assume that all of our input points can be completely separated by this line. The SVM learning algorithm nds the coe cients that results in the best separation of the classes by the hyperplane. Learning Vector Quantization
  • 10. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 10/14 The distance between the hyperplane and the closest data points is referred to as the margin. The best or optimal hyperplane that can separate the two classes is the line that has the largest margin. Only these points are relevant in de ning the hyperplane and in the construction of the classi er. These points are called the support vectors. They support or de ne the hyperplane. In practice, an optimization algorithm is used to nd the values for the coe cients that maximizes the margin. SVM might be one of the most powerful out-of-the-box classi ers and worth trying on your dataset. 9—Bagging and Random Forest Random Forest is one of the most popular and most powerful machine learning algorithms. It is a type of ensemble machine learning algorithm called Bootstrap Aggregation or bagging. The bootstrap is a powerful statistical method for estimating a quantity from a data sample. Such as a mean. You take lots of samples of your data, calculate the mean, then average all of your mean values to give you a better estimation of the true mean value. In bagging, the same approach is used, but instead for estimating entire statistical models, most commonly decision trees. Multiple samples of your training data are taken then models are constructed for each data sample. When you need to make a prediction for new data, each model makes a prediction and the predictions are averaged to give a better estimate of the true output value. Support Vector Machine
  • 11. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 11/14 Random forest is a tweak on this approach where decision trees are created so that rather than selecting optimal split points, suboptimal splits are made by introducing randomness. The models created for each sample of the data are therefore more di erent than they otherwise would be, but still accurate in their unique and di erent ways. Combining their predictions results in a better estimate of the true underlying output value. If you get good results with an algorithm with high variance (like decision trees), you can often get better results by bagging that algorithm. 10—Boosting and AdaBoost Boosting is an ensemble technique that attempts to create a strong classi er from a number of weak classi ers. This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the rst model. Models are added until the training set is predicted perfectly or a maximum number of models are added. AdaBoost was the rst really successful boosting algorithm developed for binary classi cation. It is the best starting point for understanding boosting. Modern boosting methods build on AdaBoost, most notably stochastic gradient boosting machines. Random Forest
  • 12. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 12/14 AdaBoost is used with short decision trees. After the rst tree is created, the performance of the tree on each training instance is used to weight how much attention the next tree that is created should pay attention to each training instance. Training data that is hard to predict is given more weight, whereas easy to predict instances are given less weight. Models are created sequentially one after the other, each updating the weights on the training instances that a ect the learning performed by the next tree in the sequence. After all the trees are built, predictions are made for new data, and the performance of each tree is weighted by how accurate it was on training data. Because so much attention is put on correcting mistakes by the algorithm it is important that you have clean data with outliers removed. Last Takeaway A typical question asked by a beginner, when facing a wide variety of machine learning algorithms, is “which algorithm should I use?” The answer to the question varies depending on many factors, including: (1) The size, quality, and nature of data; (2) The available computational time; (3) The urgency of the task; and (4) What you want to do with the data. Even an experienced data scientist cannot tell which algorithm will perform the best before trying di erent algorithms. Although there are many other Machine Learning algorithms, these are the most popular ones. If you’re a newbie to Machine Learning, these would be a good starting point to learn. — — AdaBoost
  • 13. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 13/14 If you enjoyed this piece, I’d love it if you hit the clap button 👏 so others might stumble upon it. You can nd my own code on GitHub, and more of my writing and projects at https://jameskle.com/. You can also follow me on Twitter, email me directly or nd me on LinkedIn. Sign up for my newsletter to receive my latest thoughts on data science, machine learning, and arti cial intelligence right at your inbox!
  • 14. 1/29/2019 A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 14/14