A background on machine learning before we dive into the source code and examples of machine learning with Spark MLlib.
This is an excerpt for the Apache Spark with Scala training available at https://www.supergloo.com/fieldnotes/portfolio/apache-spark-scala/
2. Background
• What is machine learning?
• Simply put, machine learning is creating and using
models that are learned from data.
• In other contexts it might be called predictive
modeling or data mining.
3. More Background
• Examples
• Predicting whether an email message is spam or not
• Predicting whether a credit card transaction is
fraudulent
• Predicting which advertisement a shopper is most
likely to click on
• Predicting which product a shopper is likely to
purchase; recommendation engines
5. Types of Machine Learning Models
• Supervised
• Supervised models contains a set of data
labeled with the correct answers to learn
from
• Unsupervised
• Unsupervised models do not contain such
labels.
6. Supervised Machine Learning Models Examples
• k-Nearest Neighbors
• Imagine trying to predict how a person will vote in the
next presidential election. If you know nothing about
the person and you are trying to predict their vote,
one sensible approach is to look at how their
neighbors are planning to vote (if you have the data)
7. Supervised Machine Learning Models Examples
• Naive Bayes
• A common use of the Naive Bayes machine learning
model is spam detection.
• A key to Naive Bayes is making the (big) assumption
that the presences (or absences) of a word are
independent of one another, conditional on a
message being spam or not
8. Supervised Machine Learning Models Examples
• Regression Models
• Simple Linear Regression - used to prove correlation
between two variables
• Multiple Regression - used to prove correlation with
more than two variables
9. Supervised Machine Learning Models Examples
• Decision Trees
• A decision tree uses a structure to represent a number
of possible decision paths and an outcome for each
path.
• Decision trees are often divided into classification
trees (which produce categorical outputs) and
regression trees (which produce numeric outputs).
10. Supervised Machine Learning Models Examples
• Neural Networks
• Solve problems like handwriting recognition and face
detection
11. Unsupervised Machine Learning Background
• Clustering is an example of unsupervised learning, in
which we work with completely unlabeled data.
Clustering contrasts with supervised learning which uses
labeled data as basis for making predictions
• Clusters won’t label themselves. The labels may be
determined via unsupervised learning approaches which
look at the data underlying each label.
• For example, a data set showing where millionaires live
probably has clusters in places like Beverly Hills and
Manhattan
12. Unsupervised Machine Learning Examples
• k-means
• Number of clusters k is chosen in advance
• Example: Clustering may be used when determine or
limiting a photograph to only five colors. For example,
you may have a photograph which you want to use to
print stickers. However, the printer only supports up
to 5 colors per sticker.
13. Unsupervised Machine Learning Examples
• Latent Dirichlet Analysis (LDA)
• Natural Language Processing
• Commonly used to identify common topics in a set
of documents