Introduction to machine learning

Introduction to Machine
Learning
Dr. Koundinya Desiraju,
CSIR – IGIB
1

• Group similar news articles
• Group similar patients
• Predict stock price
• Predict Life expectancy
• Recommendation systems
• Face recognition
• Spam email
• Predict Defaulters
• Lie detection
• Diagnose cancer
3

Data -----> Decisions
4
• Spam email
• Lie detection
• Diagnose cancer

3 Questions
• How many of you heard of Machine learning ?
• How many of you know Machine learning ?
• How many of you practice Machine learning ?
5

Overview
• Definition
• Types of Machine Learning
• Classification problem
• Practical Considerations
6

What is Machine Learning?
7
Machine learning is a type of artificial
intelligence (AI) that provides computers
with the ability to learn without being
explicitly programmed

Supervised vs Unsupervised
9
• Supervised Learning:
• Data and Label are provided.
• Machine learns to predict label from data.
• Unsupervised Learning:
• Only data is provided.
• Machine learns to group similar data points.

Classification vs Regression
• Predicts a class  Classification
• Predicts a real number  Regression
10

Example problems
11
• Spam email
• Lie detection
• Diagnose cancer
Unsupervised
Regression
Classification

Let’s build a Model : Data
15
library(mlbench)
data(PimaIndiansDiabetes)
head(PimaIndiansDiabetes)
## pregnant glucose pressure triceps insulin mass pedigree age diabetes
## 1 6 148 72 35 0 33.6 0.627 50 pos
## 2 1 85 66 29 0 26.6 0.351 31 neg
## 3 8 183 64 0 0 23.3 0.672 32 pos
## 4 1 89 66 23 94 28.1 0.167 21 neg
## 5 0 137 40 35 168 43.1 2.288 33 pos
## 6 5 116 74 0 0 25.6 0.201 30 neg

Let’s build a Model : Model
16
Model <- randomForest(diabetes ~., data = PimaIndiansDiabetes)
Model
##
## Call:
## randomForest(formula = diabetes ~ ., data = PimaIndiansDiabetes)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 24.09%
## Confusion matrix:
## neg pos class.error
## neg 423 77 0.1540000
## pos 108 160 0.4029851

Code
17
1.library(mlbench)
2.data(PimaIndiansDiabetes)
3.head(PimaIndiansDiabetes)
4.Model <- randomForest(diabetes ~., data = PimaIndiansDiabetes)

Linear Classifier: Logistic regression
18

Cost Function
• Function which reflects some kind of model error.
• Denoted by J(𝚹)
• Example:
• Minimize the cost function.
19

Gradient Descent: The general idea
• We have k parameters 𝜃1, 𝜃2, … , 𝜃 𝑘we’d like to train for
a model – with respect to some error/loss function 𝐽(𝜃1,
… , 𝜃 𝑘) to be minimized
• Gradient descent is one way to iteratively determine the
optimal set of parameter values:
1. Initialize parameters
2. Keep changing values to reduce 𝐽(𝜃1, … , 𝜃 𝑘)
• 𝛻𝐽 tells us which direction increases 𝐽 the most
• We go in the opposite direction of 𝛻𝐽

To actually descend…
Set initial parameter values 𝜃1
0
, … , 𝜃 𝑘
0
while(not converged) {
calculate 𝛻𝐽 (i.e. evaluate
𝜕𝐽
𝜕𝜃1
, … ,
𝜕𝐽
𝜕𝜃 𝑘
)
do {
𝜃1 ≔ 𝜃1 − α
𝜕𝐽
𝜕𝜃1
𝜃2 ≔ 𝜃2 − α
𝜕𝐽
𝜕𝜃2
⋮
𝜃 𝑘 ≔ 𝜃 𝑘 − α
𝜕𝐽
𝜕𝜃 𝑘
}
}
Where α is the ‘learning rate’ or ‘step size’
- Small enough α ensures 𝐽 𝜃1
𝑖
, … , 𝜃 𝑘
𝑖
≤ 𝐽(𝜃1
𝑖−1
, … , 𝜃 𝑘
𝑖−1
)

After each iteration:
Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

Issues
• Convex objective function guarantees convergence to global
minimum
• Non-convexity brings the possibility of getting stuck in a local
minimum
• Different, randomized starting values can fight this

Initial Values and Convergence
Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

Issues cont.
• Convergence can be slow
• Larger learning rate α can speed things up, but with too large of α, optimums
can be ‘jumped’ or skipped over - requiring more iterations
• Too small of a step size will keep convergence slow
• Can be combined with a line search to find the optimal α on every iteration

How can we evaluate a model?
40

Variance - Bias
42
• The bias is error from erroneous assumptions in the
learning alorithm. High bias can cause an algorithm to miss the
relevant relations between features and target outputs (underfitting).
• The variance is error from sensitivity to small fluctuations in the
training set. High variance can cause overfitting modeling the
random noise in the training data, rather than the intended outputs.

Leave-one-out Cross Validation
45

Leave-one-out Cross Validation
46

K – fold Cross Validation
47

Beyond training and test
• Training set
• Validation/ dev set
• Test set
50

Is the accuracy Enough to evaluate usefulness
of a model?
51

Further reading
• Andrew Ng Machine Learning course: Coursera.org
• Elements of statistical learning: Trevor and Tibshirani
53

Introduction to machine learning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Introduction to machine learning

Similaire à Introduction to machine learning (20)

Dernier

Dernier (20)

Introduction to machine learning

Notes de l'éditeur