SlideShare a Scribd company logo
1 of 90
Download to read offline
Welcome to
Explore ML!
Day 2
Linear Regression
Fitting Linear Models
Premise
What are we trying to achieve?
We are trying to solve or
predict something, based on
what we already know.
This is a regression problem,
that is, we want predict a real
valued output.
?
What exactly is “linear
regression”?
To the existing training
data, we try to find a “best
fit” line.
For now, “best fit” means
some line that seems to
match the data.
Best fit line
This is an example of
___________
This is an example of
Supervised Learning
Recall your high school
math classes.
y = mx + c
Model parameters :
Model Representation
Tweaking the value of the parameters
Loss function
Formalizing the notion of best fit line
How exactly do you say one line fits better than the
other?
Let’s look at what exactly is loss and the loss function.
Loss function
H(xi
) - yi
Loss function
Oops,
Looks like
the errors
became
bigger
Calculating the loss function
Add all the
differences
between
predicted
values and
our data
points
Calculating the loss function
But this
difference
is positive
And this
difference
is negative
Calculating the loss function
Square of
the
difference
is positive
tho :)
Square of
the
difference
is positive
tho :)
The math
Calculating the loss function
In fact this idea applies for all machine learning model
The aim is to find parameters for which is minimum.
This function is the reason why models can learn things. It makes the
model descend the gradient of errors to reach a place of perfection. It
involves some mathematical calculation to minimize the error between
the actual value and the predicted value.
Optimization Algorithm
Gradient Descent
Gradient Descent : Intuition
Gradient Descent : Algorithm
Gradient Descent : Algorithm
Gradient Descent : Math
Gradient Descent : Learning Rate
Feature Scaling
Most of the real-life datasets that you will be dealing with will have
many features ranging from a wide range of values.
If you were asked to predict the price of a house, you will be provided
with a dataset with multiple features like no.of. bedrooms, square feet
area of the house, etc.
There’s a problem though.
For example,
The range of data in each feature will vary wildly.
For example, the number of bedrooms can vary from, say, 1 to 5 and
square feet area can range from 500 to 3000.
How is this a problem?
How do you solve this?
Feature Scaling
Feature Scaling is a data preprocessing step used to normalize the
features in the dataset to make sure that all the features lie in a similar
range.
It is one of the most critical steps during the pre-processing of data
before creating a machine learning model.
If a feature’s variance is orders of magnitude more than the variance of
other features, that particular feature might dominate other features in
the dataset, which is not something we want happening in our model.
Why?
Two important scaling techniques:
1. Normalisation
2. Standardisation
Normalisation
Normalization is the concept of scaling the range of values in a feature
between 0 to 1.
This is referred as Min-Max Scaling.
Min-Max Scaling
Standardisation
Standardisation is a scaling technique where the values are centered
around the mean with a unit standard deviation.
Standardisation is required when features of input data set have large
differences between their ranges, or simply when they are measured in
different measurement units, i.e. kwh, Meters, Miles and more.
Z-score is one of the most popular methods to standardise data, and
can be done by subtracting the mean and dividing by the standard
deviation for each value of each feature.
Standardization assumes that your
observations fit a Gaussian distribution (bell
curve) with a well behaved mean and
standard deviation.
In conclusion,
Min-max normalization: Guarantees all features will have the
exact same scale but does not handle outliers well.
Z-score normalization: Handles outliers, but does not produce
normalized data with the exact same scale.
Time to apply what you’ve learnt!
___________
Before we get started
Go to kaggle.com and register for a
new account.
Before we get started
Now go to
bit.ly/gdsc-linear-reg-kaggle and
click on ‘Copy and Edit’ button
(top-right corner of the page).
Time to code!
Time to eat!
Logistic Regression
Learning to say “Yes” or “No”
Need for Logistic Regression
Why can’t we use Linear Regression and fit a
line??
Inaccurate Predictions
Out of Range Problem
For classification y=0 or y=1
In Linear Regression h(x) can be >1 or <0
But for Logistic Regression 0<= h(X) <= 1, must hold true
Hypothesis Representation
hθ
(x) = θT
X for linear regression.
But here we want 0<=hθ
(x)<=1
Sigmoid Function
hθ
(x) = g(θT
X), where g is the sigmoid function
Interpretation of hypothesis
hθ
(x) = Probability that y=1 given input x
For eg:
In cancer detection problem,
y = 1 signifies that a person has tested +ve for cancer
y = 0 signifies that a person has tested -ve for cancer
What does hθ
(x) = 0.7 mean for an example input x??
Decision Boundary
Predict y = 1 if hθ
(x)>=0.5 & y = 0 if hθ
(x)<0.5
Hence for y = 1:
⇒ hθ
(x)>=0.5
⇒ θT
X > 0
How does the model know when to
predict y =1 or y=0 ?
Say we find that θ1
= -3, θ2
= 1, θ3
= 1
Hence, on substitution :
Predict y=1 if -3+x1
+x2
> 0 , else predict y=0
hθ
(x) = 0 is called the decision boundary
i.e -3+x1
+x2
= 0 is the decision boundary
Loss Function
Recall from linear regression where we used this formula for
calculating the loss of our model
It turns out, although this same method gives a metric for loss of the
model, it has a lot of local minima
Loss function
Let’s consider the graph for -log(x) and -log(1-x)
Engineering a better loss function
-log(x)
-log(1 -x)
Let’s consider the case for a
data-point, who’s y = 1
If our model predicts
a 0, ie H(x) = 0 (the
wrong answer), we
get a really high loss
But if our model predicts a 1, ie
H(x) = 1 (the right answer), we
get a low loss
y = - log(x)
Now let’s consider the case for
a data-point, who’s y = 0
If our model predicts
a 1, ie H(x) = 1 (the
wrong answer), we
get a really high loss
But if our model
predicts a 0, ie H(x) =
0 (the right answer),
we get a low loss
y = - log(1-x)
Loss Function
Cool math trick!
Time to code again!
___________
Head over to bit.ly/gdsc-logistic-reg-kaggle and
click on ‘Copy and Edit’
Don’t forget to sign in!
K-Means Clustering
Finding Clusters in Data
K-means Clustering : Theory
K-Means Clustering is an Unsupervised Machine
Learning algorithm. Here, the algorithm can identify
the similarities and differences in the data and divide
the data into several groups called clusters. K is the
number of clusters. We can determine that K value
according to the dataset.
K means Clustering : Algorithm
Step 1 : Choose the number of clusters (K value) according to the dataset.
K = 2 here.
K means Clustering : Algorithm
Step 2 : Select the centroid points at random K points
Step 3 : Assign each data point to the closest centroid. That forms K clusters.
K mean Clustering : Algorithm
K means Clustering : Algorithm
Euclidean Distance : If (x1
, y1
) and (x2
, y2
) are two points, then the
distance between them is given by
Step 4 : Compute and place the new centroid of each cluster
K means Clustering : Algorithm
Step 5 : Reassign each data point to the new closest centroid. This step
repeats till no reassignment takes place
K means Clustering : Algorithm
K means Clustering : Algorithm
Step 6 : Model is ready
K means Clustering :
Choosing the correct number of clusters
K means Clustering :
Elbow Method
Quick Recap!
Machine Learning
Roadmap!
We want to know how we did!
Please fill out the feedback form given below:
https://bit.ly/gdsc-ml-feedback
Registered participants who’ve filled the form will
be eligible for certificates.
We want to know how we did!
We request all of you to check your inbox from
email from GDSC Event Platform. You will get it
soon.
Registered participants who’ve filled the form will
be eligible for certificates.
RESOURCES!
bit.ly/gdsc-explore-ml
Thank You!

More Related Content

What's hot

Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Summary statistics
Summary statisticsSummary statistics
Summary statisticsRupak Roy
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysisClaireWhittaker5
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingIRJET Journal
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction TechniquesVishal Patel
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use casesSridhar Ratakonda
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classificationSnehaDey21
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)Abhimanyu Dwivedi
 

What's hot (15)

Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Summary statistics
Summary statisticsSummary statistics
Summary statistics
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Chapter 05 k nn
Chapter 05 k nnChapter 05 k nn
Chapter 05 k nn
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)
 
Missing Data and Causes
Missing Data and CausesMissing Data and Causes
Missing Data and Causes
 

Similar to Explore ml day 2

Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregressionkongara
 
Bootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslyBootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslykhaled125087
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Simplilearn
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_reportRavi Gupta
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptxSaharA84
 
Analysis and Design of Algorithms notes
Analysis and Design of Algorithms  notesAnalysis and Design of Algorithms  notes
Analysis and Design of Algorithms notesProf. Dr. K. Adisesha
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 

Similar to Explore ml day 2 (20)

Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregression
 
Bootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslyBootcamp of new world to taken seriously
Bootcamp of new world to taken seriously
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
MyStataLab Assignment Help
MyStataLab Assignment HelpMyStataLab Assignment Help
MyStataLab Assignment Help
 
2 simple regression
2   simple regression2   simple regression
2 simple regression
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
working with python
working with pythonworking with python
working with python
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx
 
Analysis and Design of Algorithms notes
Analysis and Design of Algorithms  notesAnalysis and Design of Algorithms  notes
Analysis and Design of Algorithms notes
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Curvefitting
CurvefittingCurvefitting
Curvefitting
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 

Recently uploaded

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 

Recently uploaded (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 

Explore ml day 2

  • 3. Premise What are we trying to achieve? We are trying to solve or predict something, based on what we already know. This is a regression problem, that is, we want predict a real valued output. ?
  • 4. What exactly is “linear regression”? To the existing training data, we try to find a “best fit” line. For now, “best fit” means some line that seems to match the data. Best fit line
  • 5.
  • 6. This is an example of ___________
  • 7. This is an example of Supervised Learning
  • 8. Recall your high school math classes. y = mx + c Model parameters : Model Representation
  • 9.
  • 10.
  • 11. Tweaking the value of the parameters
  • 12.
  • 13. Loss function Formalizing the notion of best fit line How exactly do you say one line fits better than the other? Let’s look at what exactly is loss and the loss function.
  • 15. Loss function Oops, Looks like the errors became bigger
  • 16. Calculating the loss function Add all the differences between predicted values and our data points
  • 17. Calculating the loss function But this difference is positive And this difference is negative
  • 18. Calculating the loss function Square of the difference is positive tho :) Square of the difference is positive tho :)
  • 19. The math Calculating the loss function In fact this idea applies for all machine learning model The aim is to find parameters for which is minimum.
  • 20. This function is the reason why models can learn things. It makes the model descend the gradient of errors to reach a place of perfection. It involves some mathematical calculation to minimize the error between the actual value and the predicted value. Optimization Algorithm Gradient Descent
  • 21. Gradient Descent : Intuition
  • 22. Gradient Descent : Algorithm
  • 23. Gradient Descent : Algorithm
  • 25. Gradient Descent : Learning Rate
  • 27. Most of the real-life datasets that you will be dealing with will have many features ranging from a wide range of values.
  • 28. If you were asked to predict the price of a house, you will be provided with a dataset with multiple features like no.of. bedrooms, square feet area of the house, etc. There’s a problem though. For example,
  • 29. The range of data in each feature will vary wildly. For example, the number of bedrooms can vary from, say, 1 to 5 and square feet area can range from 500 to 3000. How is this a problem?
  • 30.
  • 31.
  • 32. How do you solve this?
  • 34. Feature Scaling is a data preprocessing step used to normalize the features in the dataset to make sure that all the features lie in a similar range. It is one of the most critical steps during the pre-processing of data before creating a machine learning model.
  • 35. If a feature’s variance is orders of magnitude more than the variance of other features, that particular feature might dominate other features in the dataset, which is not something we want happening in our model. Why?
  • 36. Two important scaling techniques: 1. Normalisation 2. Standardisation
  • 38.
  • 39. Normalization is the concept of scaling the range of values in a feature between 0 to 1. This is referred as Min-Max Scaling.
  • 41.
  • 43.
  • 44. Standardisation is a scaling technique where the values are centered around the mean with a unit standard deviation. Standardisation is required when features of input data set have large differences between their ranges, or simply when they are measured in different measurement units, i.e. kwh, Meters, Miles and more.
  • 45. Z-score is one of the most popular methods to standardise data, and can be done by subtracting the mean and dividing by the standard deviation for each value of each feature.
  • 46.
  • 47. Standardization assumes that your observations fit a Gaussian distribution (bell curve) with a well behaved mean and standard deviation.
  • 48. In conclusion, Min-max normalization: Guarantees all features will have the exact same scale but does not handle outliers well. Z-score normalization: Handles outliers, but does not produce normalized data with the exact same scale.
  • 49. Time to apply what you’ve learnt! ___________
  • 50. Before we get started Go to kaggle.com and register for a new account.
  • 51. Before we get started Now go to bit.ly/gdsc-linear-reg-kaggle and click on ‘Copy and Edit’ button (top-right corner of the page).
  • 54. Logistic Regression Learning to say “Yes” or “No”
  • 55. Need for Logistic Regression Why can’t we use Linear Regression and fit a line??
  • 57. Out of Range Problem For classification y=0 or y=1 In Linear Regression h(x) can be >1 or <0 But for Logistic Regression 0<= h(X) <= 1, must hold true
  • 58. Hypothesis Representation hθ (x) = θT X for linear regression. But here we want 0<=hθ (x)<=1
  • 59. Sigmoid Function hθ (x) = g(θT X), where g is the sigmoid function
  • 60. Interpretation of hypothesis hθ (x) = Probability that y=1 given input x For eg: In cancer detection problem, y = 1 signifies that a person has tested +ve for cancer y = 0 signifies that a person has tested -ve for cancer What does hθ (x) = 0.7 mean for an example input x??
  • 61. Decision Boundary Predict y = 1 if hθ (x)>=0.5 & y = 0 if hθ (x)<0.5 Hence for y = 1: ⇒ hθ (x)>=0.5 ⇒ θT X > 0
  • 62. How does the model know when to predict y =1 or y=0 ?
  • 63.
  • 64. Say we find that θ1 = -3, θ2 = 1, θ3 = 1 Hence, on substitution : Predict y=1 if -3+x1 +x2 > 0 , else predict y=0 hθ (x) = 0 is called the decision boundary i.e -3+x1 +x2 = 0 is the decision boundary
  • 65. Loss Function Recall from linear regression where we used this formula for calculating the loss of our model It turns out, although this same method gives a metric for loss of the model, it has a lot of local minima
  • 66. Loss function Let’s consider the graph for -log(x) and -log(1-x) Engineering a better loss function
  • 68. Let’s consider the case for a data-point, who’s y = 1 If our model predicts a 0, ie H(x) = 0 (the wrong answer), we get a really high loss But if our model predicts a 1, ie H(x) = 1 (the right answer), we get a low loss y = - log(x)
  • 69. Now let’s consider the case for a data-point, who’s y = 0 If our model predicts a 1, ie H(x) = 1 (the wrong answer), we get a really high loss But if our model predicts a 0, ie H(x) = 0 (the right answer), we get a low loss y = - log(1-x)
  • 71. Time to code again! ___________
  • 72. Head over to bit.ly/gdsc-logistic-reg-kaggle and click on ‘Copy and Edit’ Don’t forget to sign in!
  • 74. K-means Clustering : Theory K-Means Clustering is an Unsupervised Machine Learning algorithm. Here, the algorithm can identify the similarities and differences in the data and divide the data into several groups called clusters. K is the number of clusters. We can determine that K value according to the dataset.
  • 75. K means Clustering : Algorithm Step 1 : Choose the number of clusters (K value) according to the dataset. K = 2 here.
  • 76. K means Clustering : Algorithm Step 2 : Select the centroid points at random K points
  • 77. Step 3 : Assign each data point to the closest centroid. That forms K clusters. K mean Clustering : Algorithm
  • 78. K means Clustering : Algorithm Euclidean Distance : If (x1 , y1 ) and (x2 , y2 ) are two points, then the distance between them is given by
  • 79. Step 4 : Compute and place the new centroid of each cluster K means Clustering : Algorithm
  • 80. Step 5 : Reassign each data point to the new closest centroid. This step repeats till no reassignment takes place K means Clustering : Algorithm
  • 81. K means Clustering : Algorithm Step 6 : Model is ready
  • 82. K means Clustering : Choosing the correct number of clusters
  • 83. K means Clustering : Elbow Method
  • 86. We want to know how we did! Please fill out the feedback form given below: https://bit.ly/gdsc-ml-feedback Registered participants who’ve filled the form will be eligible for certificates.
  • 87. We want to know how we did! We request all of you to check your inbox from email from GDSC Event Platform. You will get it soon. Registered participants who’ve filled the form will be eligible for certificates.
  • 88.