Introduction to machine learning. Basics of machine learning. Overview of machine learning. Linear regression. logistic regression. cost function. Gradient descent. sensitivity, specificity. model selection.
3. • Group similar news articles
• Group similar patients
• Predict stock price
• Predict Life expectancy
• Recommendation systems
• Face recognition
• Spam email
• Predict Defaulters
• Lie detection
• Diagnose cancer
3
4. Data -----> Decisions
4
• Group similar news articles
• Group similar patients
• Predict stock price
• Predict Life expectancy
• Recommendation systems
• Face recognition
• Spam email
• Predict Defaulters
• Lie detection
• Diagnose cancer
5. 3 Questions
• How many of you heard of Machine learning ?
• How many of you know Machine learning ?
• How many of you practice Machine learning ?
5
7. What is Machine Learning?
7
Machine learning is a type of artificial
intelligence (AI) that provides computers
with the ability to learn without being
explicitly programmed
9. Supervised vs Unsupervised
9
• Supervised Learning:
• Data and Label are provided.
• Machine learns to predict label from data.
• Unsupervised Learning:
• Only data is provided.
• Machine learns to group similar data points.
11. Example problems
11
• Group similar news articles
• Group similar patients
• Predict stock price
• Predict Life expectancy
• Recommendation systems
• Face recognition
• Spam email
• Predict Defaulters
• Lie detection
• Diagnose cancer
Unsupervised
Regression
Classification
16. Let’s build a Model : Model
16
Model <- randomForest(diabetes ~., data = PimaIndiansDiabetes)
Model
##
## Call:
## randomForest(formula = diabetes ~ ., data = PimaIndiansDiabetes)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 24.09%
## Confusion matrix:
## neg pos class.error
## neg 423 77 0.1540000
## pos 108 160 0.4029851
19. Cost Function
• Function which reflects some kind of model error.
• Denoted by J(𝚹)
• Example:
• Minimize the cost function.
19
20. Gradient Descent: The general idea
• We have k parameters 𝜃1, 𝜃2, … , 𝜃 𝑘we’d like to train for
a model – with respect to some error/loss function 𝐽(𝜃1,
… , 𝜃 𝑘) to be minimized
• Gradient descent is one way to iteratively determine the
optimal set of parameter values:
1. Initialize parameters
2. Keep changing values to reduce 𝐽(𝜃1, … , 𝜃 𝑘)
• 𝛻𝐽 tells us which direction increases 𝐽 the most
• We go in the opposite direction of 𝛻𝐽
31. Issues
• Convex objective function guarantees convergence to global
minimum
• Non-convexity brings the possibility of getting stuck in a local
minimum
• Different, randomized starting values can fight this
32. Initial Values and Convergence
Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides
33. Initial Values and Convergence
Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides
34. Issues cont.
• Convergence can be slow
• Larger learning rate α can speed things up, but with too large of α, optimums
can be ‘jumped’ or skipped over - requiring more iterations
• Too small of a step size will keep convergence slow
• Can be combined with a line search to find the optimal α on every iteration
42. Variance - Bias
42
• The bias is error from erroneous assumptions in the
learning alorithm. High bias can cause an algorithm to miss the
relevant relations between features and target outputs (underfitting).
• The variance is error from sensitivity to small fluctuations in the
training set. High variance can cause overfitting modeling the
random noise in the training data, rather than the intended outputs.