17. Decision Tree
(If/then/else. Noncontiguous data. Can also be regression)
Used in pattern recognition
Used in option pricing in finances & identifying disease & risk trends.
Robust to errors
Can handle missing values nicely
Can handle both categorical and numerical variables.
Random Forest
(Find best split randomly.Can also be regression)
Used in industrial applications
Used for both classification and regression analysis tasks.
Creates a forest with a no. of trees & makes them random.
Runs efficiently on large databases.
Has high classification accuracy.
Support Vector Machine
(Maximum margin classifier.Fundamental Data Science algorithm)
Used in business applications – Such as comparing the relative performance of stocks over
a period of time.
Classifying data sets into different classes through a hyperplane.
Marginalizes the classes & maximizes the distances between them.
It require more accuracy & efficiency of data.
18. Principal Component Analysis (PCA)
(Distil feature space into components that describe greatest variance )
Used in gene expression analysis, stock market predictions
Used in pattern classification tasks that ignore class labels.
It is dimensionality reduction algorithm.
Used for speeding up.
Used for making compelling visualizations of complex datasets.
It identifies patterns in data & aims to make correlations of variables in them.
KMeans Clustering
(Similar datum into groups based on centroids)
Used in grouping images into different categories, detecting different activity types in
motion sensors
For monitoring whether tracked data points changes between different groups over
time.
It works by categorizing unstructured data into a no. of different groups.
Each dataset contains a collection of features
Algorithm classifies unstructured data & categorizes them based on specific features.
Naive Bayes Classifier
(Updating knowledge step by step with new info )
Based on Bayes Theorem of probability.
Used in document classification,spam filters, sentiment analysis etc.
These algorithm for ranking pages, indexing relevacy scores and classifying data
categorically.
20. NAME DESCRIPTION ADVANTAGES DISADVANTAGES
LINEAR
REGRESSION
The “best fit” line
through all data points.
Predictions are
numerical.
Easy to understand –
we clearly see what the
biggest drivers of the
model are.
● Sometimes too
simple to capture
complex
relationships
between variables.
● Tendency for the
model to “overfit”.
LOGISTIC
REGRESSION
The adaptation of linear
regression to problems
of classification (e.g.,
yes/no questions,
groups, etc.)
Also easy to understand. ● Sometimes too
simple to capture
complex
relationships
between variables.
● Tendency for the
model to “overfit”.
DECISION TREE A graph that uses
branching method to
match all possible
outcomes of a decision.
Easy to understand and
implement.
● Not often used on
its own for
prediction because
it’s also often too
simple and not
powerful enough for
complex data.
21. NAME DESCRIPTION ADVANTAGES DISADVANTAGES
RANDOM FOREST Takes the average of
many decision trees,
each of which is made
with a sample of the
data. Each tree is
weaker than a full
decision tree, but by
combining them we get
better overall
performance.
A sort of “wisdom of the
crowd”. Tends to result
in very high quality
models. Fast to train.
● Can be slow to
output predictions
relative to other
algorithms.
● Not easy to
understand
predictions.
GRADIENT
BOOSTING
Uses even weaker
decision trees, that are
increasingly focused on
“hard” examples.
High – performing. ● A small change in
the feature set or
training set can
create radical
changes in the
model.
● Not easy to
understand
predictions.
NEURAL
NETWORKS
Mimics the behavior of
the brain. Neural
networks ae
interconnected neurons
that pass messages to
each other. Deep
learning uses several
layers of neural
networks put one after
the other.
Can handel extremely
complex tasks – no
other algorithm comes
close in image
recognition.
● Very, very slow
totrain, because they
have so many layers.
Require a lot of
power.
● Almost impossible to
understand
predictions.
34. CNN & RNN
• CNN are feedforward Neural networks which take in fixed inputs & give fixed outputs.
• For example image feature classification,video processing
• RNN use internal memory
• RNN are versatile
• Use timeseries information for giving outputs.
• For example language processing tasks, text & speech analysis
Deep Belief Networks
• Used in field of Image Recognition, Video Sequence recognition, and Motioncapture
data.
• It is comprised of multiple layers of graphical models having directed and undirected
edges.
• DBN does not use any labels.
• DBNs are generative models.
• A DBN finetunes the entire input in a sequence as the model is trained
35. Boltzmann Machine
• These are two layer neural networks which make stochastic decisions
• It does not discriminate between neurons
• It learns the distribution of data using the input & makes inferences on unseen data.
• It is a generative model – it does not expect input, it rather creates it.
Generative Adversarial Networks
• GANs are used for generating new data.
• GAN comprises of 2 parts, a discriminator and a generator.
• Generator is like a reverse CNN, it takes a small amount of data & up scales it to
generate input.
• Discriminant takes this input & predicts whether it belongs to the dataset.
• GANs have been used to generate paintings.