SlideShare une entreprise Scribd logo
1  sur  26
R Bootcamp Day 3 Part 1
Jefferson Davis
Olga Scrivner
Day 2 stuff
From yesterday and the day before
• R values have types/classes such as numeric, character,
logical, dataframes, and matrices.
• Much of R functionality is in libraries
• For help on a function run
? t.test()
from the R console.
• The plot() function will usually do something useful.
R: Common stats functions
Common statistical tests are very straightforward in R. Let's try
one on yesterday's dataset cars of car speeds and stopping
distances from the 1920s.
head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
R: Common stats functions
Here's a t-test that the mean of the speeds in cars is not 12.
t.test(cars$speed, mu=12)
One Sample t-test
data: cars$speed
t = 4.5468, df = 49, p-value = 3.588e-05
alternative hypothesis: true mean is not equal
to 12
95 percent confidence interval:
13.89727 16.90273
sample estimates:
mean of x
15.4
R: Common stats functions
We can change the parameters of t-test.
t.test(cars$speed, mu=12, alternative="less",
conf.level=.99)
One Sample t-test
data: cars$speed
t = 4.5468, df = 49, p-value = 1
alternative hypothesis: true mean is less than
12
99 percent confidence interval:
-Inf 17.19834
sample estimates:
mean of x
15.4
R: Common stats functions
Anything you would see in a year long stats sequence will have
an implentation in R.
chisq.test() #Chi-squared
prop.test() #Proportions test
binom.test() #Exact binomial test
ks.test() #Kolmogorov–Smirnov
sd() #Standard deviation
cor() #Correlation
R: Linear regression
Regression analysis is one of the most popular and important
tools in statistics. If R goofed here, it would be worthless.
R uses the function lm() for linear models. The regression
formula is given in Wilkinson-Rogers notation
Predictor terms Wilkinson Notation
Intercept 1 (Default)
No intercept -1
x1 x1
x1, x2 x1 + x2
x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2)
x1x2 x1:x2
x1
2, x1 x1^2
x1 + x2 I(x1 + x2) (The letter I)
R: Linear regression
Regression analysis is one of the most important tools in
statistics. R uses Wilkinson-Rogers notation to to specify linear
models. So a model such as
yi = β0 + β1 xi1 + εi
Shows up in the R syntax as
y ~ x1
Let's review this syntax.
(Tables from https://www.mathworks.com/help/stats/wilkinson-
notation.html)
R: Linear regression
Predictor terms Wilkinson Notation
Intercept 1 (Default)
No intercept -1
x1 x1
x1, x2 x1 + x2
x1, x2, x1x2 x1*x2
(or x1 + x2 + x1:x2)
x1x2 x1:x2
x1
2, x1 x1^2
x1 + x2 I(x1 + x2)
R: Linear regression
Model Wilkinson Notation
yi = β0 + β1 xi1 + β2 xi2 + εi
Two predictors
y ~ x1 + x2
yi = β1 xi1 + β2 xi2 + εi
Two predictors and no intercept
y ~ x1 + x2 - 1
yi = β0 + β1 xi1 + β2 xi2 +
β3 xi1 xi2 + εi
Two predictors with the interaction
term
y ~ x1 * x2
y ~ x1 + x2 + x1:x2
yi = β0 + β1 (xi1 + xi2 ) + εi
Regressing on the sum of predictors
y ~ I(x1 + x2)
yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +
β4 xi1 xi2 + εi
Three predictors with one interaction
y ~ x1 * x2 + x3
R: Linear regression
Model terms Wilkinson Notation
yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi
Two predictors, no intercept
yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +
β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 +
β7 xi1 xi2xi3+ εi
Three predictors, all interaction terms
yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +
β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 +
εi
Three predictors, all two-way
interaction terms.
R: Linear regression
Model terms Wilkinson Notation
yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi
Two predictors, no intercept
y ~ x1*x2 - 1
yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +
β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 +
β7 xi1 xi2xi3+ εi
Three predictors, all interaction terms
y ~ x1 * x2 * x3
yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +
β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 +
εi
Three predictors, all two-way
interaction terms
y ~ x1 * x2 * x3 – x1:x2:x3
R: Linear regression
• R uses the function lm() for linear models.
• Generic syntax
lm(DV ~ IV1, NAME_OF_DATAFRAME)
• The above tells R that to regress the dependent variable (DV)
onto independent variable IV1. We can include other
variables and interaction effects.
lm(DV ~ IV1 + IV2 + IV1*IV2,
NAME_OF_DATAFRAME)
R: Linear regression
• Let's do an example using the cars data set. How about
regressing stopping distance on speed.
lm(dist ~ speed, cars)
Call:lm(formula = dist ~ speed, data = cars)
Coefficients:
(Intercept) speed
-17.579 3.932
• To work more let's store this in a variable
car.fit <- lm(dist ~ speed, cars)
R: Linear regression
summary(car.fit)
Call:
lm(formula = dist ~ speed, data = cars)
Residuals:
Min 1Q Median 3Q Max
-29.069 -9.525 -2.272 9.215 43.201
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.5791 6.7584 -2.601 0.0123 *
speed 3.9324 0.4155 9.464 1.49e-12 ***
R: Linear regression
• We can also look at individual fields of the lm object.
car.fit$coefficients
(Intercept) speed
-17.579095 3.932409
car.fit$residuals[1:3]
1 2 3
3.849460 11.849460 -5.947766
car.fit$fitted.values[1:3]
1 2 3
-1.849460 -1.849460 9.947766
R: Linear regression
• Plot the fit
plot(cars$speed,
cars$dist,
xlab = "distance",
ylab = "speed")
abline(car.fit,
col="red")
R: Linear regression
• Class lm object have their
own overloaded plot()
function
plot(car.fit)
R: Linear regression
• Class lm object have their
own overloaded plot()
function
plot(car.fit)
R: Linear regression
• Class lm object have their
own overloaded plot()
function
plot(car.fit)
R: Mixed models
It doesn't seem crazy to fit a slope but use a random effect for
intercept.
fmOrthF <-
lme( distance ~ age,
data = OrthoFem,
random = ~ 1 | Subject )
R: Linear regression
• Class lm object have their
own overloaded plot()
function
plot(car.fit)
R: Mixed models
• Let's take a look at a mixed model. We need a more complex
dataset. We use a subset of the Orthodont data set from the
Nonlinear Mixed-Effects Models (nlme) library.
library(nlme)
head(Orthodont)
Grouped Data: distance ~ age | Subject
distance age Subject Sex
1 26.0 8 M01 Male
2 25.0 10 M01 Male
3 29.0 12 M01 Male
4 31.0 14 M01 Male
R: Mixed models
OrthoFem <-
Orthodont[Orthodont$Sex
== "Female", ]
plot(OrthoFem)
R: Mixed models
In fact, it isn't crazy.
summary(fmOrthF)
Linear mixed-effects model fit by REML
Data: OrthoFem
AIC BIC logLik
149.2183 156.169 -70.60916
Random effects: Formula: ~1 | Subject
(Intercept) Residual
StdDev: 2.06847 0.7800331
Fixed effects: distance ~ age
Value Std.Error DF t-value p-value
(Intercept) 17.372727 0.8587419 32 20.230440 0
age 0.479545 0.0525898 32 9.118598 0
Correlation: (Intr)age -0.674
R: Conditional trees
At this point, I tag Olga in.

Contenu connexe

Similaire à R Bootcamp Day 3 Part 1 - Statistics in R

Linear equations inequalities and applications
Linear equations inequalities and applicationsLinear equations inequalities and applications
Linear equations inequalities and applications
vineeta yadav
 
Calculus - Functions Review
Calculus - Functions ReviewCalculus - Functions Review
Calculus - Functions Review
hassaanciit
 
3 2 Polynomial Functions And Their Graphs
3 2 Polynomial Functions And Their Graphs3 2 Polynomial Functions And Their Graphs
3 2 Polynomial Functions And Their Graphs
silvia
 

Similaire à R Bootcamp Day 3 Part 1 - Statistics in R (20)

Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
 
AIOU Code 803 Mathematics for Economists Semester Spring 2022 Assignment 2.pptx
AIOU Code 803 Mathematics for Economists Semester Spring 2022 Assignment 2.pptxAIOU Code 803 Mathematics for Economists Semester Spring 2022 Assignment 2.pptx
AIOU Code 803 Mathematics for Economists Semester Spring 2022 Assignment 2.pptx
 
Linear equations inequalities and applications
Linear equations inequalities and applicationsLinear equations inequalities and applications
Linear equations inequalities and applications
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
 
Matlab polynimials and curve fitting
Matlab polynimials and curve fittingMatlab polynimials and curve fitting
Matlab polynimials and curve fitting
 
Module 3 polynomial functions
Module 3   polynomial functionsModule 3   polynomial functions
Module 3 polynomial functions
 
Reed solomon Encoder and Decoder
Reed solomon Encoder and DecoderReed solomon Encoder and Decoder
Reed solomon Encoder and Decoder
 
Reed Solomon encoder and decoder \ ريد سلمون
Reed Solomon encoder and decoder \ ريد سلمونReed Solomon encoder and decoder \ ريد سلمون
Reed Solomon encoder and decoder \ ريد سلمون
 
01 FUNCTIONS.pptx
01 FUNCTIONS.pptx01 FUNCTIONS.pptx
01 FUNCTIONS.pptx
 
Regression
RegressionRegression
Regression
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
Determinants
DeterminantsDeterminants
Determinants
 
TABREZ KHAN.ppt
TABREZ KHAN.pptTABREZ KHAN.ppt
TABREZ KHAN.ppt
 
QR 1 Lesson Notes 8 - Motivation for Modelling with Linear Functions PP Show
QR 1 Lesson Notes 8 - Motivation for Modelling with Linear Functions PP ShowQR 1 Lesson Notes 8 - Motivation for Modelling with Linear Functions PP Show
QR 1 Lesson Notes 8 - Motivation for Modelling with Linear Functions PP Show
 
Algebra Trigonometry Problems
Algebra Trigonometry ProblemsAlgebra Trigonometry Problems
Algebra Trigonometry Problems
 
1519 differentiation-integration-02
1519 differentiation-integration-021519 differentiation-integration-02
1519 differentiation-integration-02
 
Fst ch2 notes
Fst ch2 notesFst ch2 notes
Fst ch2 notes
 
Calculus - Functions Review
Calculus - Functions ReviewCalculus - Functions Review
Calculus - Functions Review
 
Unit 2.6
Unit 2.6Unit 2.6
Unit 2.6
 
3 2 Polynomial Functions And Their Graphs
3 2 Polynomial Functions And Their Graphs3 2 Polynomial Functions And Their Graphs
3 2 Polynomial Functions And Their Graphs
 

Plus de Olga Scrivner

Plus de Olga Scrivner (20)

Engaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptxEngaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptx
 
HICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning TechnologiesHICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning Technologies
 
The power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systemsThe power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systems
 
Cognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use DisorderCognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use Disorder
 
Introduction to Web Scraping with Python
Introduction to Web Scraping with PythonIntroduction to Web Scraping with Python
Introduction to Web Scraping with Python
 
Call for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and TechnologyCall for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and Technology
 
Jupyter machine learning crash course
Jupyter machine learning crash courseJupyter machine learning crash course
Jupyter machine learning crash course
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash course
 
The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...
 
If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...
 
Introduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web ApplicationIntroduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web Application
 
Introduction to Overleaf Workshop
Introduction to Overleaf WorkshopIntroduction to Overleaf Workshop
Introduction to Overleaf Workshop
 
R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
Gender Disparity in Employment and Education
Gender Disparity in Employment and EducationGender Disparity in Employment and Education
Gender Disparity in Employment and Education
 
CrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for BeginnersCrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for Beginners
 
Optimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with ShinyOptimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with Shiny
 
Data Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowData Analysis and Visualization: R Workflow
Data Analysis and Visualization: R Workflow
 
Reproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid dataReproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid data
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
 

Dernier

Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 

Dernier (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

R Bootcamp Day 3 Part 1 - Statistics in R

  • 1. R Bootcamp Day 3 Part 1 Jefferson Davis Olga Scrivner
  • 2. Day 2 stuff From yesterday and the day before • R values have types/classes such as numeric, character, logical, dataframes, and matrices. • Much of R functionality is in libraries • For help on a function run ? t.test() from the R console. • The plot() function will usually do something useful.
  • 3. R: Common stats functions Common statistical tests are very straightforward in R. Let's try one on yesterday's dataset cars of car speeds and stopping distances from the 1920s. head(cars) speed dist 1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10
  • 4. R: Common stats functions Here's a t-test that the mean of the speeds in cars is not 12. t.test(cars$speed, mu=12) One Sample t-test data: cars$speed t = 4.5468, df = 49, p-value = 3.588e-05 alternative hypothesis: true mean is not equal to 12 95 percent confidence interval: 13.89727 16.90273 sample estimates: mean of x 15.4
  • 5. R: Common stats functions We can change the parameters of t-test. t.test(cars$speed, mu=12, alternative="less", conf.level=.99) One Sample t-test data: cars$speed t = 4.5468, df = 49, p-value = 1 alternative hypothesis: true mean is less than 12 99 percent confidence interval: -Inf 17.19834 sample estimates: mean of x 15.4
  • 6. R: Common stats functions Anything you would see in a year long stats sequence will have an implentation in R. chisq.test() #Chi-squared prop.test() #Proportions test binom.test() #Exact binomial test ks.test() #Kolmogorov–Smirnov sd() #Standard deviation cor() #Correlation
  • 7. R: Linear regression Regression analysis is one of the most popular and important tools in statistics. If R goofed here, it would be worthless. R uses the function lm() for linear models. The regression formula is given in Wilkinson-Rogers notation Predictor terms Wilkinson Notation Intercept 1 (Default) No intercept -1 x1 x1 x1, x2 x1 + x2 x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2) x1x2 x1:x2 x1 2, x1 x1^2 x1 + x2 I(x1 + x2) (The letter I)
  • 8. R: Linear regression Regression analysis is one of the most important tools in statistics. R uses Wilkinson-Rogers notation to to specify linear models. So a model such as yi = β0 + β1 xi1 + εi Shows up in the R syntax as y ~ x1 Let's review this syntax. (Tables from https://www.mathworks.com/help/stats/wilkinson- notation.html)
  • 9. R: Linear regression Predictor terms Wilkinson Notation Intercept 1 (Default) No intercept -1 x1 x1 x1, x2 x1 + x2 x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2) x1x2 x1:x2 x1 2, x1 x1^2 x1 + x2 I(x1 + x2)
  • 10. R: Linear regression Model Wilkinson Notation yi = β0 + β1 xi1 + β2 xi2 + εi Two predictors y ~ x1 + x2 yi = β1 xi1 + β2 xi2 + εi Two predictors and no intercept y ~ x1 + x2 - 1 yi = β0 + β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors with the interaction term y ~ x1 * x2 y ~ x1 + x2 + x1:x2 yi = β0 + β1 (xi1 + xi2 ) + εi Regressing on the sum of predictors y ~ I(x1 + x2) yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + εi Three predictors with one interaction y ~ x1 * x2 + x3
  • 11. R: Linear regression Model terms Wilkinson Notation yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors, no intercept yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + β7 xi1 xi2xi3+ εi Three predictors, all interaction terms yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + εi Three predictors, all two-way interaction terms.
  • 12. R: Linear regression Model terms Wilkinson Notation yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors, no intercept y ~ x1*x2 - 1 yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + β7 xi1 xi2xi3+ εi Three predictors, all interaction terms y ~ x1 * x2 * x3 yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + εi Three predictors, all two-way interaction terms y ~ x1 * x2 * x3 – x1:x2:x3
  • 13. R: Linear regression • R uses the function lm() for linear models. • Generic syntax lm(DV ~ IV1, NAME_OF_DATAFRAME) • The above tells R that to regress the dependent variable (DV) onto independent variable IV1. We can include other variables and interaction effects. lm(DV ~ IV1 + IV2 + IV1*IV2, NAME_OF_DATAFRAME)
  • 14. R: Linear regression • Let's do an example using the cars data set. How about regressing stopping distance on speed. lm(dist ~ speed, cars) Call:lm(formula = dist ~ speed, data = cars) Coefficients: (Intercept) speed -17.579 3.932 • To work more let's store this in a variable car.fit <- lm(dist ~ speed, cars)
  • 15. R: Linear regression summary(car.fit) Call: lm(formula = dist ~ speed, data = cars) Residuals: Min 1Q Median 3Q Max -29.069 -9.525 -2.272 9.215 43.201 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -17.5791 6.7584 -2.601 0.0123 * speed 3.9324 0.4155 9.464 1.49e-12 ***
  • 16. R: Linear regression • We can also look at individual fields of the lm object. car.fit$coefficients (Intercept) speed -17.579095 3.932409 car.fit$residuals[1:3] 1 2 3 3.849460 11.849460 -5.947766 car.fit$fitted.values[1:3] 1 2 3 -1.849460 -1.849460 9.947766
  • 17. R: Linear regression • Plot the fit plot(cars$speed, cars$dist, xlab = "distance", ylab = "speed") abline(car.fit, col="red")
  • 18. R: Linear regression • Class lm object have their own overloaded plot() function plot(car.fit)
  • 19. R: Linear regression • Class lm object have their own overloaded plot() function plot(car.fit)
  • 20. R: Linear regression • Class lm object have their own overloaded plot() function plot(car.fit)
  • 21. R: Mixed models It doesn't seem crazy to fit a slope but use a random effect for intercept. fmOrthF <- lme( distance ~ age, data = OrthoFem, random = ~ 1 | Subject )
  • 22. R: Linear regression • Class lm object have their own overloaded plot() function plot(car.fit)
  • 23. R: Mixed models • Let's take a look at a mixed model. We need a more complex dataset. We use a subset of the Orthodont data set from the Nonlinear Mixed-Effects Models (nlme) library. library(nlme) head(Orthodont) Grouped Data: distance ~ age | Subject distance age Subject Sex 1 26.0 8 M01 Male 2 25.0 10 M01 Male 3 29.0 12 M01 Male 4 31.0 14 M01 Male
  • 24. R: Mixed models OrthoFem <- Orthodont[Orthodont$Sex == "Female", ] plot(OrthoFem)
  • 25. R: Mixed models In fact, it isn't crazy. summary(fmOrthF) Linear mixed-effects model fit by REML Data: OrthoFem AIC BIC logLik 149.2183 156.169 -70.60916 Random effects: Formula: ~1 | Subject (Intercept) Residual StdDev: 2.06847 0.7800331 Fixed effects: distance ~ age Value Std.Error DF t-value p-value (Intercept) 17.372727 0.8587419 32 20.230440 0 age 0.479545 0.0525898 32 9.118598 0 Correlation: (Intr)age -0.674
  • 26. R: Conditional trees At this point, I tag Olga in.