SlideShare une entreprise Scribd logo
1  sur  24
History of Regression
• T H E T E R M R E G R E S S I O N WA S I N T R O D U C E D B Y F R A N C I S G A LTO N I N 1 8 8 6 I N H I S R E S E A R C H PA P E R “ FA M I LY
L I K E N E S S I N S TAT U R E ”
• H E O B S E R V E D T H AT TA L L PA R E N T S H AV E TA L L S O N S A N D S H O R T PA R E N T S H AV E S H O R T S O N S
• H E C A M E U P W I T H T H E V I E W T H AT T H E H E I G H T O F T H E S O N S R E G R E S S “ R E V E R T B A C K ” TO WA R D S T H E AV E R A G E
H E I G H T O F T H E P O P U L AT I O N
Dr. Deepak Mehta
Associate Professor
AIT CSE
Regression Introduction
• Lets take a simple example : Suppose your manager asked you to predict annual sales.
• There can be a hundred of factors (drivers) that affects sales
• In this case, sales is your dependent variable. Factors affecting sales are independent
variables.
• Regression analysis would help you to solve this problem.
• In simple words, regression analysis is used to model the relationship between a
dependent variable and one or more independent variables.
It helps us to answer the following questions –
• Which of the drivers have a significant impact on sales.
• Which is the most important driver of sales
• How do the drivers interact with each other
• What would be annual sale next year
Regressing Terminologies
Outliers
Suppose there is an observation in the dataset which is having a very high or very low
value as compared to the other observations in the data, i.e. it does not belong to the
population, such an observation is called an outlier. In simple words, it is extreme value.
An outlier is a problem because many times it hampers the results we get.
Multicollinearity
When the independent variables are highly correlated to each other then the variables
are said to be multicollinear. Many types of regression techniques assumes
multicollinearity should not be present in the dataset. It is because it causes problems in
ranking variables based on its importance. Or it makes job difficult in selecting the most
important independent variable (factor).
Terminologies
Heteroscedasticity
When dependent variable’s variability is not equal across values of an independent
variable, it is called heteroscedasticity. Example – As one’s income increases, the
variability of food consumption will increase. A poorer person will spend a rather constant
amount by always eating inexpensive food; a wealthier person may occasionally buy
inexpensive food and at other times eat expensive meals. Those with higher incomes
display a greater variability of food consumption.
Underfitting and Overfitting
When we use unnecessary explanatory variables it might lead to overfitting. Overfitting
means that our algorithm works well on the training set but is unable to perform better on
the test sets. It is also known as problem of high variance.
History of Regression
Galton’s law of universal regression was confirmed by his friend Karl Pearson, who collected data
from more than 1000 families.
He found that the average height of the sons of tall father is less than the their father’s height
and average height of the sons of short father is more than their fathers’ height.
Research paper of Karl Pearson
“On law of inheritance, biometrika 1903
Modern Regression
It investigate the dependence of one variable, conventionally called dependent variable, on one
or more variables called independent variable and provide an equation to be used for estimating
or predicting the average of the dependent variable from the known values of independent
variable.
Modern Regression
A variable whose variation we try to explain is dependent variable while the independent
variable is a variable that is used to explain the variation in the dependent variable.
When dependency of variable is studied upon one variable it is called simple linear regression or
two variable regression
While dependency upon two or more variable is studied in multiple regression
Deterministic vs Probabilistic
If there is a exact relationship b.w. variables s.t knowing value of one variable we can have
unique value of other, such relation is known as Deterministic relationship or mathematical
relation.
e.g. consider y=a + bx. Substituting value of x we can have unique value of y
Example of such relationship is F=32+(9/5)C, relationship b.w Fahrenheit and Celsius.
Area of circle pi*r*r
Deterministic vs Probabilistic
If relationship b.w. variable is not exct, we cannot have precisely value of one variable by putting
the value of the other, such relationship is known as probabilistic or stochastic or statistical
relationship
e.g. consider y=a + bX+e. substituting value of x we cannot have unique value of y, until we know
e. where e is unknown random error.
For ex. Consider relationship b.w. weight and height of person . All person with same height will
not have same weight.
In deterministic model all the point of order pairing of two variable (x,y) fall on the line of
relationship.
Dependent Variable
Whereas in probabilistic model all the points of order pairing of two variable (x,y) donot fall on
the line of relationship, there must be scattering around the line of relationship
Dependent variable is random variable also called
◦ Explained variable
◦ Predictand
◦ Regressand
◦ Response
◦ Endogenous
◦ Outcome
◦ Controlled variable
Independent variable
The independent variable is fixed variable also called
◦ Explanatory variable
◦ Predictor
◦ Regressor
◦ Stimulus
◦ Exogenous
◦ Covariate
◦ Control variable
Assumption of Linear Regression
Linear Relationship
Very low /no multicollinearity
No heteroscedastic
No autocorrelation b.w the errors
Normal distribution of errors
All observation are independent
Summary
LR is a ML algorithm based on Supervised Learning
Regression model is a target prediction model based on independent variables
Regression technique find out the relationship bw X(input) and Y(output) e.g. workexpSalary
Hypothesis function of LR: y=W0+W1X
When training this model : it fits the best line to predict the values of y for a given value of X
..
•It shows the significant relationship b.w. label( dependent variable) and the features
(independent variables)
•It indicate the strength of impact of multiple independent variables on a dependent variable
•Regression model is used to predict the continuous value
•LR establishes a relationship b.w. dependent variable (y) and one or more independent
variables(X) using best fit straight line (also known as regression line)
There must be a linear relation between independent and dependent variables
There should not be any outliers present
No heteroscedasticity
Sample observations should be independent
Error terms should be normally distributed with mean 0 and constant variance
Absence of multicollinearity and auto-correlation
Interpretation of regression
coefficients
Let us consider an example where the dependent variable is marks obtained by a
student and explanatory variables are number of hours studied and no. of classes
attended. Suppose on fitting linear regression we got the linear regression as:
Marks obtained = 5 + 2 (no. of hours studied) + 0.5(no. of classes attended)
Thus we can have the regression coefficients 2 and 0.5 which can interpreted as:
- If no. of hours studied and no. of classes are 0 then the student will obtain 5 marks.
- Keeping no. of classes attended constant, if student studies for one hour more then he
will score 2 more marks in the examination.
- Similarly keeping no. of hours studied constant, if student attends one more class then
he will attain 0.5 marks more.
Least Squares
As we know y=w0+w1X+error h(xi)+ei
Error=yi-h(xi)
Error is residual error in the observation
Main aim is to minimize the residual error
Squared error or cost function:
Linear Regression Assignment Advertisement Dataset
X = advertising['TV']
y = advertising['Sales’]
-------------------------------------------------------
from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test=train_test_split(X, y, train_size = 0.7, test_size = 0.3, random_state = 100)
X_train.head()
y_train.head()

Contenu connexe

Similaire à ML4 Regression.pptx

Correlation and Regression.pdf
Correlation and Regression.pdfCorrelation and Regression.pdf
Correlation and Regression.pdf
AadarshSah1
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
curwenmichaela
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
jasoninnes20
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
richardnorman90310
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
recepmaz
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
recepmaz
 

Similaire à ML4 Regression.pptx (20)

Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Artificial Intelligence (Unit - 8).pdf
Artificial Intelligence   (Unit  -  8).pdfArtificial Intelligence   (Unit  -  8).pdf
Artificial Intelligence (Unit - 8).pdf
 
Data-Analysis.pptx
Data-Analysis.pptxData-Analysis.pptx
Data-Analysis.pptx
 
TYPESOFDATAANALYSIS research methodology .pdf
TYPESOFDATAANALYSIS research methodology .pdfTYPESOFDATAANALYSIS research methodology .pdf
TYPESOFDATAANALYSIS research methodology .pdf
 
Correlations and t scores (2)
Correlations and t scores (2)Correlations and t scores (2)
Correlations and t scores (2)
 
Correlation and Regression.pdf
Correlation and Regression.pdfCorrelation and Regression.pdf
Correlation and Regression.pdf
 
Quantitative analysis
Quantitative analysisQuantitative analysis
Quantitative analysis
 
Real Estate Data Set
Real Estate Data SetReal Estate Data Set
Real Estate Data Set
 
Research hypothesis....ppt
Research hypothesis....pptResearch hypothesis....ppt
Research hypothesis....ppt
 
Mr 4. quantitative research design and methods
Mr 4. quantitative research design and methodsMr 4. quantitative research design and methods
Mr 4. quantitative research design and methods
 
Data analysis test for association BY Prof Sachin Udepurkar
Data analysis   test for association BY Prof Sachin UdepurkarData analysis   test for association BY Prof Sachin Udepurkar
Data analysis test for association BY Prof Sachin Udepurkar
 
HYPOTHESES.pptx
HYPOTHESES.pptxHYPOTHESES.pptx
HYPOTHESES.pptx
 
Glm
GlmGlm
Glm
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 
REG.pptx
REG.pptxREG.pptx
REG.pptx
 

Dernier

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 

ML4 Regression.pptx

  • 1. History of Regression • T H E T E R M R E G R E S S I O N WA S I N T R O D U C E D B Y F R A N C I S G A LTO N I N 1 8 8 6 I N H I S R E S E A R C H PA P E R “ FA M I LY L I K E N E S S I N S TAT U R E ” • H E O B S E R V E D T H AT TA L L PA R E N T S H AV E TA L L S O N S A N D S H O R T PA R E N T S H AV E S H O R T S O N S • H E C A M E U P W I T H T H E V I E W T H AT T H E H E I G H T O F T H E S O N S R E G R E S S “ R E V E R T B A C K ” TO WA R D S T H E AV E R A G E H E I G H T O F T H E P O P U L AT I O N Dr. Deepak Mehta Associate Professor AIT CSE
  • 2. Regression Introduction • Lets take a simple example : Suppose your manager asked you to predict annual sales. • There can be a hundred of factors (drivers) that affects sales • In this case, sales is your dependent variable. Factors affecting sales are independent variables. • Regression analysis would help you to solve this problem. • In simple words, regression analysis is used to model the relationship between a dependent variable and one or more independent variables. It helps us to answer the following questions – • Which of the drivers have a significant impact on sales. • Which is the most important driver of sales • How do the drivers interact with each other • What would be annual sale next year
  • 3. Regressing Terminologies Outliers Suppose there is an observation in the dataset which is having a very high or very low value as compared to the other observations in the data, i.e. it does not belong to the population, such an observation is called an outlier. In simple words, it is extreme value. An outlier is a problem because many times it hampers the results we get. Multicollinearity When the independent variables are highly correlated to each other then the variables are said to be multicollinear. Many types of regression techniques assumes multicollinearity should not be present in the dataset. It is because it causes problems in ranking variables based on its importance. Or it makes job difficult in selecting the most important independent variable (factor).
  • 4. Terminologies Heteroscedasticity When dependent variable’s variability is not equal across values of an independent variable, it is called heteroscedasticity. Example – As one’s income increases, the variability of food consumption will increase. A poorer person will spend a rather constant amount by always eating inexpensive food; a wealthier person may occasionally buy inexpensive food and at other times eat expensive meals. Those with higher incomes display a greater variability of food consumption. Underfitting and Overfitting When we use unnecessary explanatory variables it might lead to overfitting. Overfitting means that our algorithm works well on the training set but is unable to perform better on the test sets. It is also known as problem of high variance.
  • 5. History of Regression Galton’s law of universal regression was confirmed by his friend Karl Pearson, who collected data from more than 1000 families. He found that the average height of the sons of tall father is less than the their father’s height and average height of the sons of short father is more than their fathers’ height. Research paper of Karl Pearson “On law of inheritance, biometrika 1903
  • 6. Modern Regression It investigate the dependence of one variable, conventionally called dependent variable, on one or more variables called independent variable and provide an equation to be used for estimating or predicting the average of the dependent variable from the known values of independent variable.
  • 7. Modern Regression A variable whose variation we try to explain is dependent variable while the independent variable is a variable that is used to explain the variation in the dependent variable. When dependency of variable is studied upon one variable it is called simple linear regression or two variable regression While dependency upon two or more variable is studied in multiple regression
  • 8. Deterministic vs Probabilistic If there is a exact relationship b.w. variables s.t knowing value of one variable we can have unique value of other, such relation is known as Deterministic relationship or mathematical relation. e.g. consider y=a + bx. Substituting value of x we can have unique value of y Example of such relationship is F=32+(9/5)C, relationship b.w Fahrenheit and Celsius. Area of circle pi*r*r
  • 9. Deterministic vs Probabilistic If relationship b.w. variable is not exct, we cannot have precisely value of one variable by putting the value of the other, such relationship is known as probabilistic or stochastic or statistical relationship e.g. consider y=a + bX+e. substituting value of x we cannot have unique value of y, until we know e. where e is unknown random error. For ex. Consider relationship b.w. weight and height of person . All person with same height will not have same weight. In deterministic model all the point of order pairing of two variable (x,y) fall on the line of relationship.
  • 10. Dependent Variable Whereas in probabilistic model all the points of order pairing of two variable (x,y) donot fall on the line of relationship, there must be scattering around the line of relationship Dependent variable is random variable also called ◦ Explained variable ◦ Predictand ◦ Regressand ◦ Response ◦ Endogenous ◦ Outcome ◦ Controlled variable
  • 11. Independent variable The independent variable is fixed variable also called ◦ Explanatory variable ◦ Predictor ◦ Regressor ◦ Stimulus ◦ Exogenous ◦ Covariate ◦ Control variable
  • 12.
  • 13. Assumption of Linear Regression Linear Relationship Very low /no multicollinearity No heteroscedastic No autocorrelation b.w the errors Normal distribution of errors All observation are independent
  • 14. Summary LR is a ML algorithm based on Supervised Learning Regression model is a target prediction model based on independent variables Regression technique find out the relationship bw X(input) and Y(output) e.g. workexpSalary Hypothesis function of LR: y=W0+W1X When training this model : it fits the best line to predict the values of y for a given value of X
  • 15. .. •It shows the significant relationship b.w. label( dependent variable) and the features (independent variables) •It indicate the strength of impact of multiple independent variables on a dependent variable •Regression model is used to predict the continuous value •LR establishes a relationship b.w. dependent variable (y) and one or more independent variables(X) using best fit straight line (also known as regression line)
  • 16. There must be a linear relation between independent and dependent variables There should not be any outliers present No heteroscedasticity Sample observations should be independent Error terms should be normally distributed with mean 0 and constant variance Absence of multicollinearity and auto-correlation
  • 17. Interpretation of regression coefficients Let us consider an example where the dependent variable is marks obtained by a student and explanatory variables are number of hours studied and no. of classes attended. Suppose on fitting linear regression we got the linear regression as: Marks obtained = 5 + 2 (no. of hours studied) + 0.5(no. of classes attended) Thus we can have the regression coefficients 2 and 0.5 which can interpreted as: - If no. of hours studied and no. of classes are 0 then the student will obtain 5 marks. - Keeping no. of classes attended constant, if student studies for one hour more then he will score 2 more marks in the examination. - Similarly keeping no. of hours studied constant, if student attends one more class then he will attain 0.5 marks more.
  • 18. Least Squares As we know y=w0+w1X+error h(xi)+ei Error=yi-h(xi) Error is residual error in the observation Main aim is to minimize the residual error Squared error or cost function:
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Linear Regression Assignment Advertisement Dataset X = advertising['TV'] y = advertising['Sales’] ------------------------------------------------------- from sklearn.model_selection import train_test_split X_train, X_test, y_train,y_test=train_test_split(X, y, train_size = 0.7, test_size = 0.3, random_state = 100) X_train.head() y_train.head()