SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Hadley Wickham
Stat405Intro to modelling
Tuesday, 16 November 2010
1. What is a linear model?
2. Removing trends
3. Transformations
4. Categorical data
5. Visualising models
Tuesday, 16 November 2010
What is a
linear
model?
Tuesday, 16 November 2010
Tuesday, 16 November 2010
observed value
Tuesday, 16 November 2010
observed value
Tuesday, 16 November 2010
predicted
value
observed value
Tuesday, 16 November 2010
predicted
value
observed value
Tuesday, 16 November 2010
predicted
value
observed value
residual
Tuesday, 16 November 2010
y ~ x
# yhat = b1x + b0
# Want to find b's that minimise distance
# between y and yhat
z ~ x + y
# zhat = b2x + b1y + b0
# Want to find b's that minimise distance
# between z and zhat
z ~ x * y
# zhat = b3(x⋅y) + b2x + b1y + b0
Tuesday, 16 November 2010
X is measured without error.
Relationship is linear.
Errors are independent.
Errors have normal distribution.
Errors have constant variance.
Assumptions
Tuesday, 16 November 2010
Removing
trends
Tuesday, 16 November 2010
library(ggplot2)
diamonds$x[diamonds$x == 0] <- NA
diamonds$y[diamonds$y == 0] <- NA
diamonds$y[diamonds$y > 30] <- NA
diamonds$z[diamonds$z == 0] <- NA
diamonds$z[diamonds$z > 30] <- NA
diamonds <- subset(diamonds, carat < 2)
qplot(x, y, data = diamonds)
qplot(x, z, data = diamonds)
Tuesday, 16 November 2010
Tuesday, 16 November 2010
Tuesday, 16 November 2010
mody <- lm(y ~ x, data = diamonds, na = na.exclude)
coef(mody)
# yhat = 0.05 + 0.99⋅x
# Plot x vs yhat
qplot(x, predict(mody), data = diamonds)
# Plot x vs (y - yhat) = residual
qplot(x, resid(mody), data = diamonds)
# Standardised residual:
qplot(x, rstandard(mody), data = diamonds)
Tuesday, 16 November 2010
qplot(x, resid(mody), data=dclean)
Tuesday, 16 November 2010
qplot(x, y - x, data=dclean)
Tuesday, 16 November 2010
Your turn
Do the same thing for z and x. What
threshold might you use to remove
outlying values?
Are the errors from predicting z and y
from x related?
Tuesday, 16 November 2010
modz <- lm(z ~ x, data = diamonds, na = na.exclude)
coef(modz)
# zhat = 0.03 + 0.61x
qplot(x, rstandard(modz), data = diamonds)
last_plot() + ylim(-10, 10)
qplot(rstandard(mody), rstandard(modz))
Tuesday, 16 November 2010
Transformations
Tuesday, 16 November 2010
Can we use a
linear model to
remove this trend?
Tuesday, 16 November 2010
Can we use a
linear model to
remove this trend?
Tuesday, 16 November 2010
Can we use a
linear model to
remove this trend?
Linear models are linear in
their parameters which can be
any transformation of the data
Tuesday, 16 November 2010
Your turn
Use a linear model to remove the effect of
carat on price. Confirm that this worked
by plotting model residuals vs. color.
How can you interpret the model
coefficients and residuals?
Tuesday, 16 November 2010
modprice <- lm(log(price) ~ log(carat),
data = diamonds, na = na.exclude)
diamonds$relprice <- exp(resid(modprice))
qplot(carat, relprice, data = diamonds)
diamonds <- subset(diamonds, carat < 2)
qplot(carat, relprice, data = diamonds)
qplot(carat, relprice, data = diamonds) +
facet_wrap(~ color)
qplot(relprice, ..density.., data = diamonds,
colour = color, geom = "freqpoly", binwidth = 0.2)
qplot(relprice, ..density.., data = diamonds,
colour = cut, geom = "freqpoly", binwidth = 0.2)
Tuesday, 16 November 2010
log(Y) = a * log(X) + b
Y = c . dX
An additive model becomes a
multiplicative model.
Intercept becomes starting point,
slope becomes geometric growth.
Multiplicative model
Tuesday, 16 November 2010
Residuals
resid(mod) = log(Y) - log(Yhat)
exp(resid(mod)) = Y / (Yhat)
Tuesday, 16 November 2010
# Useful trick - close to 0, exp(x) ~ x + 1
x <- seq(-0.2, 0.2, length = 100)
qplot(x, exp(x)) + geom_abline(intercept = 1)
qplot(x, x / exp(x)) + scale_y_continuous("Percent
error", formatter = percent)
# Not so useful here because the x is also
# transformed
coef(modprice)
Tuesday, 16 November 2010
Categorical
data
Tuesday, 16 November 2010
Compare the results of the following two
functions. What can you say about the
model?
ddply(diamonds, "color", summarise,
mean = mean(price))
coef(lm(price ~ color, data = diamonds))
Your turn
Tuesday, 16 November 2010
Categorical data
Converted into a numeric matrix, with one
column for each level. Contains 1 if that
observation has that level, 0 otherwise.
However, if we just do that naively, we end
up with too many columns (because we
have one extra column for the intercept)
So everything is relative to the first level.
Tuesday, 16 November 2010
Visualising
models
Tuesday, 16 November 2010
# What do you think this model does?
lm(log(price) ~ log(carat) + color,
data = diamonds)
# What about this one?
lm(log(price) ~ log(carat) * color,
data = diamonds)
# Or this one?
lm(log(price) ~ cut * color,
data = diamonds)
# How can we interpret the results?
Tuesday, 16 November 2010
mod1 <- lm(log(price) ~ log(carat) + cut, data = diamonds)
mod2 <- lm(log(price) ~ log(carat) * cut, data = diamonds)
# One way is to explore predictions from the model
# over an evenly spaced grid. expand.grid makes
# this easy
grid <- expand.grid(
carat = seq(0.2, 2, length = 20),
cut = levels(diamonds$cut),
KEEP.OUT.ATTRS = FALSE)
str(grid)
grid
grid$p1 <- exp(predict(mod1, grid))
grid$p2 <- exp(predict(mod2, grid))
Tuesday, 16 November 2010
Plot the predictions from the two sets of
models. How are they different?
Your turn
Tuesday, 16 November 2010
qplot(carat, p1, data = grid, colour = cut,
geom = "line")
qplot(carat, p2, data = grid, colour = cut,
geom = "line")
qplot(log(carat), log(p1), data = grid,
colour = cut, geom = "line")
qplot(log(carat), log(p2), data = grid,
colour = cut, geom = "line")
qplot(carat, p1 / p2, data = grid, colour = cut,
geom = "line")
Tuesday, 16 November 2010
# Another approach is the effects package
# install.packages("effects")
library(effects)
effect("cut", mod1)
cut <- as.data.frame(effect("cut", mod1))
qplot(fit, reorder(cut, fit), data = cut)
qplot(fit, reorder(cut, fit), data = cut) +
geom_errorbarh(aes(xmin = lower, xmax = upper),
height = 0.1)
qplot(exp(fit), reorder(cut, fit), data = cut) +
geom_errorbarh(aes(xmin = exp(lower),
xmax = exp(upper)), height = 0.1)
Tuesday, 16 November 2010

Contenu connexe

Tendances (20)

Ch17 25
Ch17 25Ch17 25
Ch17 25
 
集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回
 
Ch14 23
Ch14 23Ch14 23
Ch14 23
 
Top School in Delhi NCR
Top School in Delhi NCRTop School in Delhi NCR
Top School in Delhi NCR
 
Slides September 16
Slides September 16Slides September 16
Slides September 16
 
Derivadas 2
Derivadas 2Derivadas 2
Derivadas 2
 
Formulas
FormulasFormulas
Formulas
 
Graphing Exponentials
Graphing ExponentialsGraphing Exponentials
Graphing Exponentials
 
Admissions in india 2015
Admissions in india 2015Admissions in india 2015
Admissions in india 2015
 
MS2 POwer Rules
MS2 POwer RulesMS2 POwer Rules
MS2 POwer Rules
 
Chapter 16
Chapter 16Chapter 16
Chapter 16
 
Pde unit 1
Pde unit 1Pde unit 1
Pde unit 1
 
Matematica
MatematicaMatematica
Matematica
 
Ch22 28
Ch22 28Ch22 28
Ch22 28
 
Ch16 11
Ch16 11Ch16 11
Ch16 11
 
TABLA DE DERIVADAS
TABLA DE DERIVADASTABLA DE DERIVADAS
TABLA DE DERIVADAS
 
8.1+ 8.2 graphing exponentials
8.1+ 8.2 graphing exponentials8.1+ 8.2 graphing exponentials
8.1+ 8.2 graphing exponentials
 
13 Bi Trans
13 Bi Trans13 Bi Trans
13 Bi Trans
 
Math basic2
Math basic2Math basic2
Math basic2
 
Alg2 lesson 7-2
Alg2 lesson 7-2Alg2 lesson 7-2
Alg2 lesson 7-2
 

En vedette

Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)
Hadley Wickham
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with r
Vivian S. Zhang
 

En vedette (20)

21 spam
21 spam21 spam
21 spam
 
04 Wrapup
04 Wrapup04 Wrapup
04 Wrapup
 
Correlations, Trends, and Outliers in ggplot2
Correlations, Trends, and Outliers in ggplot2Correlations, Trends, and Outliers in ggplot2
Correlations, Trends, and Outliers in ggplot2
 
16 Sequences
16 Sequences16 Sequences
16 Sequences
 
20 date-times
20 date-times20 date-times
20 date-times
 
03 Conditional
03 Conditional03 Conditional
03 Conditional
 
Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
R workshop iii -- 3 hours to learn ggplot2 series
R workshop iii -- 3 hours to learn ggplot2 seriesR workshop iii -- 3 hours to learn ggplot2 series
R workshop iii -- 3 hours to learn ggplot2 series
 
03 Modelling
03 Modelling03 Modelling
03 Modelling
 
23 data-structures
23 data-structures23 data-structures
23 data-structures
 
R packages
R packagesR packages
R packages
 
02 Ddply
02 Ddply02 Ddply
02 Ddply
 
01 Intro
01 Intro01 Intro
01 Intro
 
Reshaping Data in R
Reshaping Data in RReshaping Data in R
Reshaping Data in R
 
Machine learning in R
Machine learning in RMachine learning in R
Machine learning in R
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with r
 

Similaire à 24 modelling

Dont Drive on the Railroad Tracks
Dont Drive on the Railroad TracksDont Drive on the Railroad Tracks
Dont Drive on the Railroad Tracks
Eugene Wallingford
 
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Abebe Admasu
 

Similaire à 24 modelling (20)

11 Simulation
11 Simulation11 Simulation
11 Simulation
 
10 simulation
10 simulation10 simulation
10 simulation
 
10 simulation
10 simulation10 simulation
10 simulation
 
03 extensions
03 extensions03 extensions
03 extensions
 
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
 
Артём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data AnalysisАртём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data Analysis
 
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy DyagilevMonads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
 
Capacity maximising traffic signal control policies
Capacity maximising traffic signal control policiesCapacity maximising traffic signal control policies
Capacity maximising traffic signal control policies
 
数学カフェ 確率・統計・機械学習回 「速習 確率・統計」
数学カフェ 確率・統計・機械学習回 「速習 確率・統計」数学カフェ 確率・統計・機械学習回 「速習 確率・統計」
数学カフェ 確率・統計・機械学習回 「速習 確率・統計」
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
SAT/SMT solving in Haskell
SAT/SMT solving in HaskellSAT/SMT solving in Haskell
SAT/SMT solving in Haskell
 
Dont Drive on the Railroad Tracks
Dont Drive on the Railroad TracksDont Drive on the Railroad Tracks
Dont Drive on the Railroad Tracks
 
110617 lt
110617 lt110617 lt
110617 lt
 
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
 
FLATMAP ZAT SHIT : les monades expliquées aux geeks (Devoxx France 2013)
FLATMAP ZAT SHIT : les monades expliquées aux geeks (Devoxx France 2013)FLATMAP ZAT SHIT : les monades expliquées aux geeks (Devoxx France 2013)
FLATMAP ZAT SHIT : les monades expliquées aux geeks (Devoxx France 2013)
 
Plot3D Package and Example in R.-Data visualizat,on
Plot3D Package and Example in R.-Data visualizat,onPlot3D Package and Example in R.-Data visualizat,on
Plot3D Package and Example in R.-Data visualizat,on
 
Plot3D package in R-package-for-3d-and-4d-graph-Data visualization.
Plot3D package in R-package-for-3d-and-4d-graph-Data visualization.Plot3D package in R-package-for-3d-and-4d-graph-Data visualization.
Plot3D package in R-package-for-3d-and-4d-graph-Data visualization.
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
 
Number theoretic-rsa-chailos-new
Number theoretic-rsa-chailos-newNumber theoretic-rsa-chailos-new
Number theoretic-rsa-chailos-new
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)
 

Plus de Hadley Wickham (20)

27 development
27 development27 development
27 development
 
27 development
27 development27 development
27 development
 
22 spam
22 spam22 spam
22 spam
 
19 tables
19 tables19 tables
19 tables
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
17 polishing
17 polishing17 polishing
17 polishing
 
16 critique
16 critique16 critique
16 critique
 
15 time-space
15 time-space15 time-space
15 time-space
 
14 case-study
14 case-study14 case-study
14 case-study
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
08 functions
08 functions08 functions
08 functions
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 
06 data
06 data06 data
06 data
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 
04 reports
04 reports04 reports
04 reports
 
02 large
02 large02 large
02 large
 

Dernier

Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
vineshkumarsajnani12
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
allensay1
 
obat aborsi bandung wa 081336238223 jual obat aborsi cytotec asli di bandung9...
obat aborsi bandung wa 081336238223 jual obat aborsi cytotec asli di bandung9...obat aborsi bandung wa 081336238223 jual obat aborsi cytotec asli di bandung9...
obat aborsi bandung wa 081336238223 jual obat aborsi cytotec asli di bandung9...
yulianti213969
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
 

Dernier (20)

Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGParadip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
 
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
 
Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdf
 
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
 
obat aborsi bandung wa 081336238223 jual obat aborsi cytotec asli di bandung9...
obat aborsi bandung wa 081336238223 jual obat aborsi cytotec asli di bandung9...obat aborsi bandung wa 081336238223 jual obat aborsi cytotec asli di bandung9...
obat aborsi bandung wa 081336238223 jual obat aborsi cytotec asli di bandung9...
 
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service AvailableBerhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
 
Nanded Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Nanded Call Girl Just Call 8084732287 Top Class Call Girl Service AvailableNanded Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Nanded Call Girl Just Call 8084732287 Top Class Call Girl Service Available
 
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptxQSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
 
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTSDurg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
 
KALYANI 💋 Call Girl 9827461493 Call Girls in Escort service book now
KALYANI 💋 Call Girl 9827461493 Call Girls in  Escort service book nowKALYANI 💋 Call Girl 9827461493 Call Girls in  Escort service book now
KALYANI 💋 Call Girl 9827461493 Call Girls in Escort service book now
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
 
Bangalore Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Service Avai...
Bangalore Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Service Avai...Bangalore Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Service Avai...
Bangalore Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Service Avai...
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 

24 modelling

  • 1. Hadley Wickham Stat405Intro to modelling Tuesday, 16 November 2010
  • 2. 1. What is a linear model? 2. Removing trends 3. Transformations 4. Categorical data 5. Visualising models Tuesday, 16 November 2010
  • 10. y ~ x # yhat = b1x + b0 # Want to find b's that minimise distance # between y and yhat z ~ x + y # zhat = b2x + b1y + b0 # Want to find b's that minimise distance # between z and zhat z ~ x * y # zhat = b3(x⋅y) + b2x + b1y + b0 Tuesday, 16 November 2010
  • 11. X is measured without error. Relationship is linear. Errors are independent. Errors have normal distribution. Errors have constant variance. Assumptions Tuesday, 16 November 2010
  • 13. library(ggplot2) diamonds$x[diamonds$x == 0] <- NA diamonds$y[diamonds$y == 0] <- NA diamonds$y[diamonds$y > 30] <- NA diamonds$z[diamonds$z == 0] <- NA diamonds$z[diamonds$z > 30] <- NA diamonds <- subset(diamonds, carat < 2) qplot(x, y, data = diamonds) qplot(x, z, data = diamonds) Tuesday, 16 November 2010
  • 16. mody <- lm(y ~ x, data = diamonds, na = na.exclude) coef(mody) # yhat = 0.05 + 0.99⋅x # Plot x vs yhat qplot(x, predict(mody), data = diamonds) # Plot x vs (y - yhat) = residual qplot(x, resid(mody), data = diamonds) # Standardised residual: qplot(x, rstandard(mody), data = diamonds) Tuesday, 16 November 2010
  • 18. qplot(x, y - x, data=dclean) Tuesday, 16 November 2010
  • 19. Your turn Do the same thing for z and x. What threshold might you use to remove outlying values? Are the errors from predicting z and y from x related? Tuesday, 16 November 2010
  • 20. modz <- lm(z ~ x, data = diamonds, na = na.exclude) coef(modz) # zhat = 0.03 + 0.61x qplot(x, rstandard(modz), data = diamonds) last_plot() + ylim(-10, 10) qplot(rstandard(mody), rstandard(modz)) Tuesday, 16 November 2010
  • 22. Can we use a linear model to remove this trend? Tuesday, 16 November 2010
  • 23. Can we use a linear model to remove this trend? Tuesday, 16 November 2010
  • 24. Can we use a linear model to remove this trend? Linear models are linear in their parameters which can be any transformation of the data Tuesday, 16 November 2010
  • 25. Your turn Use a linear model to remove the effect of carat on price. Confirm that this worked by plotting model residuals vs. color. How can you interpret the model coefficients and residuals? Tuesday, 16 November 2010
  • 26. modprice <- lm(log(price) ~ log(carat), data = diamonds, na = na.exclude) diamonds$relprice <- exp(resid(modprice)) qplot(carat, relprice, data = diamonds) diamonds <- subset(diamonds, carat < 2) qplot(carat, relprice, data = diamonds) qplot(carat, relprice, data = diamonds) + facet_wrap(~ color) qplot(relprice, ..density.., data = diamonds, colour = color, geom = "freqpoly", binwidth = 0.2) qplot(relprice, ..density.., data = diamonds, colour = cut, geom = "freqpoly", binwidth = 0.2) Tuesday, 16 November 2010
  • 27. log(Y) = a * log(X) + b Y = c . dX An additive model becomes a multiplicative model. Intercept becomes starting point, slope becomes geometric growth. Multiplicative model Tuesday, 16 November 2010
  • 28. Residuals resid(mod) = log(Y) - log(Yhat) exp(resid(mod)) = Y / (Yhat) Tuesday, 16 November 2010
  • 29. # Useful trick - close to 0, exp(x) ~ x + 1 x <- seq(-0.2, 0.2, length = 100) qplot(x, exp(x)) + geom_abline(intercept = 1) qplot(x, x / exp(x)) + scale_y_continuous("Percent error", formatter = percent) # Not so useful here because the x is also # transformed coef(modprice) Tuesday, 16 November 2010
  • 31. Compare the results of the following two functions. What can you say about the model? ddply(diamonds, "color", summarise, mean = mean(price)) coef(lm(price ~ color, data = diamonds)) Your turn Tuesday, 16 November 2010
  • 32. Categorical data Converted into a numeric matrix, with one column for each level. Contains 1 if that observation has that level, 0 otherwise. However, if we just do that naively, we end up with too many columns (because we have one extra column for the intercept) So everything is relative to the first level. Tuesday, 16 November 2010
  • 34. # What do you think this model does? lm(log(price) ~ log(carat) + color, data = diamonds) # What about this one? lm(log(price) ~ log(carat) * color, data = diamonds) # Or this one? lm(log(price) ~ cut * color, data = diamonds) # How can we interpret the results? Tuesday, 16 November 2010
  • 35. mod1 <- lm(log(price) ~ log(carat) + cut, data = diamonds) mod2 <- lm(log(price) ~ log(carat) * cut, data = diamonds) # One way is to explore predictions from the model # over an evenly spaced grid. expand.grid makes # this easy grid <- expand.grid( carat = seq(0.2, 2, length = 20), cut = levels(diamonds$cut), KEEP.OUT.ATTRS = FALSE) str(grid) grid grid$p1 <- exp(predict(mod1, grid)) grid$p2 <- exp(predict(mod2, grid)) Tuesday, 16 November 2010
  • 36. Plot the predictions from the two sets of models. How are they different? Your turn Tuesday, 16 November 2010
  • 37. qplot(carat, p1, data = grid, colour = cut, geom = "line") qplot(carat, p2, data = grid, colour = cut, geom = "line") qplot(log(carat), log(p1), data = grid, colour = cut, geom = "line") qplot(log(carat), log(p2), data = grid, colour = cut, geom = "line") qplot(carat, p1 / p2, data = grid, colour = cut, geom = "line") Tuesday, 16 November 2010
  • 38. # Another approach is the effects package # install.packages("effects") library(effects) effect("cut", mod1) cut <- as.data.frame(effect("cut", mod1)) qplot(fit, reorder(cut, fit), data = cut) qplot(fit, reorder(cut, fit), data = cut) + geom_errorbarh(aes(xmin = lower, xmax = upper), height = 0.1) qplot(exp(fit), reorder(cut, fit), data = cut) + geom_errorbarh(aes(xmin = exp(lower), xmax = exp(upper)), height = 0.1) Tuesday, 16 November 2010