This document provides an introduction to machine learning in R. It explains that machine learning trains machines to solve problems without being explicitly programmed, and discusses R's suitability for machine learning due to being open-source, written by statisticians, and having a large community. The document outlines unsupervised, supervised, and reinforcement learning algorithms and provides examples of each type, including uses like customer analysis, prediction, and game playing. It also introduces the Iris dataset as a sample dataset for R programmers.
2. What is Machine Learning?
➔ Machine Learning is a subfield of AI (Artificial Intelligence).
➔ It is training a machine to solve a problem without being explicitly
programmed to do so
➔ The capability of a machine to imitate intelligent human behavior
3. Why choose R for Machine Learning?
➔ There are other programming languages that can be used for machine
learning.
➔ Some of them are Python, Matlab, C++.
➔ R is used because it is open-sourced
➔ It is a language written by statisticians.
➔ It is an extremely powerful language that can be used for manipulating
and analyzing data.
➔ It has a wide community of enthusiasts and lots of open-source projects
to work on.
4. Machine Learning Algorithms for Data
Science
Machine Learning algorithms used in Data Science work on
datasets to solve problems.
They can be divided into 3 groups
➔ Unsupervised Learning
➔ Supervised Learning
➔ Reinforcement Learning
5. Unsupervised Learning
These set of algorithms works on datasets where the variables are not
labeled. These algorithms find patterns and trends within the data. For
this, they use:
➔ Clustering: grouping data points together based on similar
characteristics
➔ Association: grouping data points together based on their
relationship
➔ Dimension reduction: Reducing a high dimensional dataset into a
lower dimensional one, so that it is more manageable.
6. Examples of Unsupervised Learning
Algorithms and their uses.
➔ K means clustering
➔ Hierarchical clustering
➔ Principal Component Analysis
Some uses of unsupervised machine learning
algorithms are:
➔ Businesses to do customer personality analysis to find out their
ideal customers.
➔ Market Research
➔ Genetic research, where genes are grouped according to their
characteristics and similarities.
7. Supervised Learning Algorithms and their
uses
➔ Classification: Used to classify data
Ex. Random Forest, K-nearest neighbor, SVM (Support Vector
Machines)
➔ Regression: Used to predict outcomes accurately
Ex. Linear, Logistic or polynomial regression
These algorithms are used by businesses to make predictions for
their sales or revenues.
8. Reinforcement learning and its uses
Where the agent learns by getting feedback from the
environment so that it can adjust its strategy to solve the
problem. It can also be called a feedback-based machine
learning technique.
This type of machine learning algorithm is not widely used in R,
but has been included here for completeness. It is used in
game-playing and robotics. Reinforcement learning algorithms
are frequently used in C++.
9. Example of a dataset: Iris
It is a built-in dataset in R and enables R programmers to work on it without
explicitly loading it in R.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa