SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Linear Regression
with R
Ed Goodwin
Houston R Users Group
Recap from the last meetup
• statistical learning vs. machine learning
• supervised vs. unsupervised learning
• categorical models vs. quantitative models
Linear Regression is…
• statistical learning
• supervised learning
• quantitative model
A Simple Dataset
What’s the best model for
this data?
…a straight line, aka a linear model…
What’s the best fit for the
line?
The line that minimizes the residual
error, or point distance from the line
which is why we refer to the regression line
as the least squares error regression line
We determine this with a linear
regression to determine the y-intercept
and the slope of the line that minimizes
the error residuals
Use the lm function in R to create linear models
A regression on one variable is known as a simple
linear regression
Assumptions of
Linear Models
• relationship of predictors to predicted variables
is linear
• the variance of error terms is constant
(homoskedastic)
• minimal to no outliers in the data (high or low y
in response to x)
• minimal to no leverage points in the data (high
or low x relative to the data)
• no collinearity among predictor variables
• predictors are additive to reliability of model
(no interaction effects)
Linear Model Fitting
Data Analysis
Bonds dataset from
“A Modern Approach to Regression with R” Sheather, Simon. 2009.
https://link.springer.com/book/10.1007%2F978-0-387-09608-7
Consider the following dataset of bond prices
bonds.dat = read.csv("http://www.stat.tamu.edu/~sheather/book/docs/datasets/bonds.txt", sep='t')
Data Analysis
Data analysis is the art of asking
questions of the data and searching
for answers.
What questions should we ask?
• Is Bid Price a function of Coupon
Rate or vice versa?
• What type of relationship does Bid
Price appear to have with Coupon
Rate?
• Is there a formula we could use to
predict Bid Price based on Coupon
Rate?
Linear Model of Bond Prices
as a function of Coupon Rates
• Is this line a good or
bad fit with the data?
• Why or why not?
• Can we improve the
model?
• How?
Know your data and know
how your models work!
• Why would outliers skew
the linear regression
model?
• What should we do
about it?
• Why do these outliers
exist?
Outliers
What are the outliers?
A Flower bond is a U.S treasury bond recoverable
before maturity upon payment or fulfilling a condition, if
used to settle federal estate taxes. When flower bonds
are surrendered in payment of taxes, and accepted as
such, that constitutes payment of those taxes for statute
of limitations and statutory interest purposes.
Adjusted Bond Model
• After removing the
outliers the model looks
much better
• But how do you know
that it’s a better model?
Evaluating Models
Evaluating the two
bond models
Key measures are p-value,
Residual Std Error (RSE), and R2
Residual Sum of Squares (RSS)
“In statistics, the residual sum of squares (RSS), also known as the
sum of squared residuals (SSR) or the sum of squared errors of
prediction (SSE), is the sum of the squares of residuals
(deviations predicted from actual empirical values of data).”
Source: https://en.wikipedia.org/wiki/Residual_sum_of_squares
Multiple Regression
Models
If Simple Regression Models
depict a linear relationship between
two variables, what do you think
Multiple Regression Models do?
Multiple Regression Models
describe the relationship between
a scalar variable and two or more
predictor variables
Why not just run
several simple linear
regressions?
Advertising Data Set
Sales based on multiple
types and levels of
advertising spend
(TV, Radio, Newspaper)
First, let’s look at the data
Are the model assumptions intact?
Multiple Regression uses the lm
function as well. Simply modify the
formula by adding more variables
Analyze the Model
Why is newspaper coefficient so low?
What if we removed
Newspaper from the model?
How do we model
Interaction Effects?
Modify the regression formula to include
interactions between the predictor variables
Does this interaction
improve the model?
What about reintroducing
Newspaper with interaction?
Model with Newspaper &
TV interaction modeled
What is the best model?
The most accurate model on the training data is
always the model with the most predictor
variables (p) and the lowest residual sum of
squares (RSS)…but what is the best model?
The best model is…
The best model is the simplest model
with the most predictive power on the
entire data population while staying
within your resource constraints.
Bonus: Easy and tidy with
the broom package
From the broom vignette:
https://cran.r-project.org/web/packages/broom/vignettes/broom.html
The broom package takes the messy output of built-in functions in R, such as lm,
nls, or t.test, and turns them into tidy data frames.
This package provides three S3 methods that do three distinct kinds of tidying.
• tidy: constructs a data frame that summarizes the model's statistical findings.
This includes coefficients and p-values for each term in a regression, per-cluster
information in clustering applications, or per-test information for multtest
functions.
• augment: add columns to the original data that was modeled. This includes
predictions, residuals, and cluster assignments.
• glance: construct a concise one-row summary of the model. This typically
contains values such as R
2,
adjusted R
2,
and residual standard error that are
computed once for the entire model.
broom package
using our Advertising data

Contenu connexe

Tendances

Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Rebecca Bilbro
 
Introduction to MARS (1999)
Introduction to MARS (1999)Introduction to MARS (1999)
Introduction to MARS (1999)
Salford Systems
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
Shrey Nishchal
 

Tendances (20)

Math 533 course project salescall inc
Math 533 course project salescall incMath 533 course project salescall inc
Math 533 course project salescall inc
 
Musings of kaggler
Musings of kagglerMusings of kaggler
Musings of kaggler
 
Grade 11, U0-L3-Graphing
Grade 11, U0-L3-GraphingGrade 11, U0-L3-Graphing
Grade 11, U0-L3-Graphing
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
Matlab for marketing people
Matlab for marketing peopleMatlab for marketing people
Matlab for marketing people
 
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Factors affecting customer satisfaction
Factors affecting customer satisfactionFactors affecting customer satisfaction
Factors affecting customer satisfaction
 
Credit risk - loan default model
Credit risk - loan default modelCredit risk - loan default model
Credit risk - loan default model
 
Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regression
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Biostatistics Workshop: Missing Data
Biostatistics Workshop: Missing DataBiostatistics Workshop: Missing Data
Biostatistics Workshop: Missing Data
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
Machine learning session8(svm nlp)
Machine learning   session8(svm nlp)Machine learning   session8(svm nlp)
Machine learning session8(svm nlp)
 
Introduction to MARS (1999)
Introduction to MARS (1999)Introduction to MARS (1999)
Introduction to MARS (1999)
 
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science -  Part XV - MARS, Logistic Regression, & Survival AnalysisData Science -  Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
 
Business Analytics Foundation with R tools - Part 2
Business Analytics Foundation with R tools - Part 2Business Analytics Foundation with R tools - Part 2
Business Analytics Foundation with R tools - Part 2
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
 

Similaire à HRUG - Linear regression with R

Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
Ali T. Lotia
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
Eric Esajian
 
linearregression-1909240jhgg53948.pptx
linearregression-1909240jhgg53948.pptxlinearregression-1909240jhgg53948.pptx
linearregression-1909240jhgg53948.pptx
bishalnandi2
 

Similaire à HRUG - Linear regression with R (20)

Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
 
Regresión
RegresiónRegresión
Regresión
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
 
Regression Analysis Techniques.pptx
Regression Analysis Techniques.pptxRegression Analysis Techniques.pptx
Regression Analysis Techniques.pptx
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
Linear regression
Linear regressionLinear regression
Linear regression
 
linearregression-1909240jhgg53948.pptx
linearregression-1909240jhgg53948.pptxlinearregression-1909240jhgg53948.pptx
linearregression-1909240jhgg53948.pptx
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 

Plus de egoodwintx

Intro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUGIntro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUG
egoodwintx
 

Plus de egoodwintx (12)

Scaling in R
Scaling in RScaling in R
Scaling in R
 
Boardgamegeek scraping
Boardgamegeek scrapingBoardgamegeek scraping
Boardgamegeek scraping
 
HRUG - Text Mining to Construct Causal Models
HRUG - Text Mining to Construct Causal ModelsHRUG - Text Mining to Construct Causal Models
HRUG - Text Mining to Construct Causal Models
 
Collaborative Package Development in R
Collaborative Package Development in RCollaborative Package Development in R
Collaborative Package Development in R
 
Unit Testing in R with Testthat - HRUG
Unit Testing in R with Testthat - HRUGUnit Testing in R with Testthat - HRUG
Unit Testing in R with Testthat - HRUG
 
Intro to RStudio
Intro to RStudioIntro to RStudio
Intro to RStudio
 
Intro to Forecasting in R - Part 4
Intro to Forecasting in R - Part 4Intro to Forecasting in R - Part 4
Intro to Forecasting in R - Part 4
 
Fantasy Football Draft Optimization in R - HRUG
Fantasy Football Draft Optimization in R - HRUGFantasy Football Draft Optimization in R - HRUG
Fantasy Football Draft Optimization in R - HRUG
 
Intro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUGIntro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUG
 
Intro To Forecasting - Part 2 - HRUG
Intro To Forecasting - Part 2 - HRUGIntro To Forecasting - Part 2 - HRUG
Intro To Forecasting - Part 2 - HRUG
 
Eag 201110-hrugregexpresentation-111006104128-phpapp02
Eag 201110-hrugregexpresentation-111006104128-phpapp02Eag 201110-hrugregexpresentation-111006104128-phpapp02
Eag 201110-hrugregexpresentation-111006104128-phpapp02
 
Hrug intro to forecasting
Hrug intro to forecastingHrug intro to forecasting
Hrug intro to forecasting
 

Dernier

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 

Dernier (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

HRUG - Linear regression with R

  • 1. Linear Regression with R Ed Goodwin Houston R Users Group
  • 2. Recap from the last meetup • statistical learning vs. machine learning • supervised vs. unsupervised learning • categorical models vs. quantitative models
  • 3. Linear Regression is… • statistical learning • supervised learning • quantitative model
  • 5. What’s the best model for this data? …a straight line, aka a linear model…
  • 6. What’s the best fit for the line?
  • 7. The line that minimizes the residual error, or point distance from the line which is why we refer to the regression line as the least squares error regression line
  • 8. We determine this with a linear regression to determine the y-intercept and the slope of the line that minimizes the error residuals Use the lm function in R to create linear models A regression on one variable is known as a simple linear regression
  • 10. • relationship of predictors to predicted variables is linear • the variance of error terms is constant (homoskedastic) • minimal to no outliers in the data (high or low y in response to x) • minimal to no leverage points in the data (high or low x relative to the data) • no collinearity among predictor variables • predictors are additive to reliability of model (no interaction effects)
  • 12. Data Analysis Bonds dataset from “A Modern Approach to Regression with R” Sheather, Simon. 2009. https://link.springer.com/book/10.1007%2F978-0-387-09608-7 Consider the following dataset of bond prices bonds.dat = read.csv("http://www.stat.tamu.edu/~sheather/book/docs/datasets/bonds.txt", sep='t')
  • 13. Data Analysis Data analysis is the art of asking questions of the data and searching for answers. What questions should we ask? • Is Bid Price a function of Coupon Rate or vice versa? • What type of relationship does Bid Price appear to have with Coupon Rate? • Is there a formula we could use to predict Bid Price based on Coupon Rate?
  • 14. Linear Model of Bond Prices as a function of Coupon Rates • Is this line a good or bad fit with the data? • Why or why not? • Can we improve the model? • How?
  • 15. Know your data and know how your models work! • Why would outliers skew the linear regression model? • What should we do about it? • Why do these outliers exist? Outliers
  • 16. What are the outliers? A Flower bond is a U.S treasury bond recoverable before maturity upon payment or fulfilling a condition, if used to settle federal estate taxes. When flower bonds are surrendered in payment of taxes, and accepted as such, that constitutes payment of those taxes for statute of limitations and statutory interest purposes.
  • 17. Adjusted Bond Model • After removing the outliers the model looks much better • But how do you know that it’s a better model?
  • 19. Evaluating the two bond models Key measures are p-value, Residual Std Error (RSE), and R2
  • 20. Residual Sum of Squares (RSS) “In statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared errors of prediction (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data).” Source: https://en.wikipedia.org/wiki/Residual_sum_of_squares
  • 22. If Simple Regression Models depict a linear relationship between two variables, what do you think Multiple Regression Models do?
  • 23. Multiple Regression Models describe the relationship between a scalar variable and two or more predictor variables
  • 24. Why not just run several simple linear regressions?
  • 25. Advertising Data Set Sales based on multiple types and levels of advertising spend (TV, Radio, Newspaper)
  • 26. First, let’s look at the data
  • 27. Are the model assumptions intact?
  • 28. Multiple Regression uses the lm function as well. Simply modify the formula by adding more variables
  • 29. Analyze the Model Why is newspaper coefficient so low?
  • 30. What if we removed Newspaper from the model?
  • 31. How do we model Interaction Effects? Modify the regression formula to include interactions between the predictor variables
  • 34. Model with Newspaper & TV interaction modeled
  • 35. What is the best model? The most accurate model on the training data is always the model with the most predictor variables (p) and the lowest residual sum of squares (RSS)…but what is the best model?
  • 36. The best model is… The best model is the simplest model with the most predictive power on the entire data population while staying within your resource constraints.
  • 37. Bonus: Easy and tidy with the broom package From the broom vignette: https://cran.r-project.org/web/packages/broom/vignettes/broom.html The broom package takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy data frames. This package provides three S3 methods that do three distinct kinds of tidying. • tidy: constructs a data frame that summarizes the model's statistical findings. This includes coefficients and p-values for each term in a regression, per-cluster information in clustering applications, or per-test information for multtest functions. • augment: add columns to the original data that was modeled. This includes predictions, residuals, and cluster assignments. • glance: construct a concise one-row summary of the model. This typically contains values such as R 2, adjusted R 2, and residual standard error that are computed once for the entire model.
  • 38. broom package using our Advertising data