SlideShare une entreprise Scribd logo
1  sur  60
Télécharger pour lire hors ligne
Understanding R for Epidemiologists 
Tom´as J. Arag´on, MD, DrPH 
Faculty, Division of Epidemiology 
UC Berkeley School of Public Health 
Health Officer, City & County of San Francisco 
Director, Population Health Division (PHD) 
San Francisco Department of Public Health 
Blog: http://www.medepi.com 
Email: aragon@berkeley.edu 
September 8, 2014 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 1 / 60
Outline 
1 Background 
Cost 
Quality 
Community 
2 Getting started with R 
Full-function calculator/spreadsheet 
Extensible statistical packages 
High quality graphics tool 
Multi-use programming language 
3 Working with R data objects 
Atomic vs. recursive data objects 
Working with vectors, matrices, & arrays 
Working with lists, data frames, and functions 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 2 / 60
Background 
Background: Major issues 
Cost 
Quality 
Community 
Functionality 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 3 / 60
Background Cost 
Cost: Open Source vs. Proprietary Software 
Costs of software 
Costs of multi-platforms 
Costs of education and training 
Costs of adding solutions (e.g., packages) 
Costs of solving problems and sharing solutions 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 4 / 60
Background Quality 
Quality: Open Source vs. Proprietary Software 
Core Development Team 
Large pool of users/testers 
Quality control process for packages 
Bug fixes based on need/demand, not profits 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 5 / 60
Background Community 
Community: Open Source vs. Proprietary Software 
Large community of users 
Transparent development process 
Growing number of books and trainings 
Growing number of free tutorials and manuals 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 6 / 60
Background Community 
Current R contributors 
Douglas Bates 
John Chambers 
Peter Dalgaard 
Seth Falcon 
Robert Gentleman 
Kurt Hornik 
Stefano Iacus 
Ross Ihaka 
Friedrich Leisch 
Uwe Ligges 
Thomas Lumley 
Martin Maechler 
Duncan Murdoch 
Paul Murrell 
Martyn Plummer 
Brian Ripley 
Deepayan Sarkar 
Duncan Temple Lang 
Luke Tierney 
Simon Urbanek 
Source: http://www.r-project.org/contributors.html 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 7 / 60
Getting started with R 
What is R? 
Full-function calculator/spreadsheet 
Extensible statistical packages 
High-quality graphics tool 
Multi-use programming language 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 8 / 60
Getting started with R Full-function calculator/spreadsheet 
Full-function calculator: Selected math operators 
Operator Description Try these examples 
+ addition 5+4 
− subtraction 5-4 
 multiplication 5*4 
/ division 5/4 
ˆ exponentiation 5^4 
− unary minus (change current 
sign) 
-5 
abs absolute value abs(-23) 
exp exponentiation (e to a power) exp(8) 
log logarithm (default is natural log) log(exp(8)) 
sqrt square root sqrt(64) 
%/% integer divide 10%/%3 
%% modulus 10%%3 
%*% matrix multiplication xx - matrix(1:4, 2, 2) 
xx%*%c(1, 1) 
c(1, 1)%*%xx 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 9 / 60
Getting started with R Extensible statistical packages 
Extensible statistical packages 
Generalized Linear Models (Base) 
Linear regression 
Logistic regression 
Poisson regression 
Cox Proportional Hazard models (Survival) 
Cox PH regression 
Conditional logistic regression (matched case-control studies) 
Meta-analysis (meta) 
Complex survey analysis (survey) 
Epidemiology packages 
epitools 
epicalc 
epibasix 
epiR 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 10 / 60
Getting started with R High quality graphics tool 
Graphics display of sample size curves 
Alternative distribution 
H1 
Power 
(1 - b) 
Null distribution 
H0 
b a 2 
-Z1-a 2 m0 Z1-a 2 m1 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 11 / 60
Getting started with R High quality graphics tool 
Graphics display of P value function 
0.2 0.5 1.0 2.0 2.9 5.0 10.0 20.0 
1 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0.05 
0 
0 
10 
20 
30 
40 
50 
60 
70 
80 
90 
95 
100 
Confidence level (%) 
Rate Ratio 
P−value 
Null hypothesis 
Median unbiased estimate 
95% Lower Confidence Limit = 0.74 
95% Upper Confidence Limit = 21.0 
95% Confidence Interval 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 12 / 60
Getting started with R High quality graphics tool 
Graphical display of multiple linear regression 
0 10 20 30 40 50 
10 20 30 40 50 60 70 80 90 
0 
10 
20 
30 
40 
50 
x1 
x2 
y 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 13 / 60
Getting started with R High quality graphics tool 
Epidemic curve using Color Brewer colors 
Unknown 
WNF 
WNND 
0 20 40 60 80 
West Nile Virus Human Cases Reported in California 
by Disease Week as of December 14, 2004 
Cases 
+ Bird 
2/24 
+ Horse 
6/20 
+ Chicken 
5/17 
+ Mosquito 
4/14 
52 03 06 09 12 15 18 21 24 27 30 33 36 39 42 45 48 51 
Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
Disease Week  Calendar Month, 2004 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 14 / 60
Getting started with R Multi-use programming language 
Multi-use programming language 
Vectorized computations 
Functional programming language 
Object-oriented programming 
Text processing (e.g., using regular expressions) 
Links to C, Fortran, etc. 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 15 / 60
Working with R data objects Atomic vs. recursive data objects 
Data objects in R 
Object types 
Vector 
Matrix 
Array 
List 
Data frame 
Function 
Operations 
Create 
Name 
Index 
Replace 
Manipulate 
Do computations 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 16 / 60
Working with R data objects Atomic vs. recursive data objects 
Summary of types of data objects in R 
Data object Possible modea Default class 
Atomic 
vector character, numeric, logical NULL 
matrix character, numeric, logical NULL 
array character, numeric, logical NULL 
Recursive 
list list NULL 
data frame list data frame 
function function NULL 
a We are ignoring complex numbers 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 17 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding vectors 
A vector is a collection of like elements without dimensions1. The vector 
elements are all of the same mode (either character, numeric, or logical). 
 y - c(Pedro, Paulo, Maria) 
 y 
[1] Pedro Paulo Maria 
 x - c(1, 2, 3, 4, 5) 
 x 
[1] 1 2 3 4 5 
 x  3 
[1] TRUE TRUE FALSE FALSE FALSE 
1In other programming languages, vectors are either row vectors or column vectors. 
R does not make this distinction until it is necessary. 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 18 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding vectors: Indexing 
Indexing by Try these examples 
Position x - c(chol=234, sbp=148, dbp=78, age=54) 
x[2] #positions to include 
x[c(2, 3)] 
x[-c(1, 3, 4)] #positions to exclude 
x[-c(1, 4)] 
Name x[sbp] 
x[c(sbp, dbp)] 
Logical x  100 
x[x  100] 
(x  150)  (x  70) 
bp - (x  150)  (x  70) 
x[bp] 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 19 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding vectors: Replacement 
Replacing by Try these examples 
Position x - c(chol=234, sbp=148, dbp=78, age=54) 
x[1] 
x[1] - 250 
x 
Name x[sbp] 
x[sbp] - 150 
x 
Logical x[x100] 
x[x100] - NA 
x 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 20 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding vectors: Replacement 
 x - c(chol = 234, sbp = 148, dbp = 78, age = 54) 
 x[1] - 250 #by position 
 x 
chol sbp dbp age 
250 148 78 54 
 x[sbp] - 150 #by name 
 x 
chol sbp dbp age 
250 150 78 54 
 x[x100] 
dbp age 
78 54 
 x[x100] - NA #by logical 
 x 
chol sbp dbp age 
250 150 NA NA 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 21 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding matrices 
A matrix is a collection of like elements organized into a 2-dimensional 
(tabular) data object. Matrix elements can be either numeric, character, 
or logical. We can think of a matrix as a vector with a 2-dimensional 
structure. Contingency tables in epidemiology are represented in R as 
numeric matrices or arrays. An array is the generalization of matrices to 3 
or more dimensions (commonly known as stratified tables). We cover 
arrays later, for now we will focus on 2-dimensional tables. 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 22 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding matrices 
When R returns a matrix the [n,] indicates the nth row and [,m] 
indicates the mth column. 
 x - c(a, b, c, d) 
 y - matrix(x, 2, 2) 
 y 
[,1] [,2] 
[1,] a c 
[2,] b d 
 y[1,] 
[1] a c 
 y[,2] 
[1] c d 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 23 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding matrices 
 x - c(30, 21, 170, 180) # creating 
 y - matrix(x, 2, 2, byrow = TRUE) # creating 
 y 
[,1] [,2] 
[1,] 30 21 
[2,] 170 180 
 rownames(y) - c(Deaths, Survivors) # naming 
 colnames(y) - c(Tolbutamide, Placebo) # naming 
 y[2, 1] - 174 # replace by position 
 y[Survivors, Placebo] - 184 # replace by name 
 y 
Tolbutamide Placebo 
Deaths 30 21 
Survivors 174 184 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 24 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding matrices 
Consider the 2 × 2 table of crude data in Table. In this randomized clinical 
trial (RCT), diabetic subjects were randomly assigned to receive either 
tolbutamide, an oral hypoglycemic drug, or placebo. Because this was a 
prospective study we can calculate risks, odds, a risk ratio, and an odds 
ratio. We will do this using R as a calculator. 
Table : Deaths among subjects who received tolbutamide and placebo in the 
Unversity Group Diabetes Program (1970) 
Tolbutamide Placebo 
Deaths 30 21 
Survivors 174 184 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 25 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding matrices 
 dat - matrix(c(30, 174, 21, 184), 2, 2) 
 rownames(dat) - c(Deaths, Survivors) 
 colnames(dat) - c(Tolbutamide, Placebo) 
 coltot - apply(dat, 2, sum) #column totals 
 risks - dat[Deaths,]/coltot 
 risk.ratio - risks/risks[2] #risk ratio 
 odds - risks/(1-risks) 
 odds.ratio - odds/odds[2] #odds ratio 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 26 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding matrices 
 # display results 
 dat 
Tolbutamide Placebo 
Deaths 30 21 
Survivors 174 184 
 rbind(risks, risk.ratio, odds, odds.ratio) 
Tolbutamide Placebo 
risks 0.1470588 0.1024390 
risk.ratio 1.4355742 1.0000000 
odds 0.1724138 0.1141304 
odds.ratio 1.5106732 1.0000000 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 27 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding arrays 
An array is a collection of like elements organized into a n-dimensional 
data object. When R returns an array the [n,,] indicates the nth row 
and [,m,] indicates the mth column, and so on. 
 x - 1:8 
 y - array(x, dim=c(2, 2, 2)) 
 y 
, , 1 
[,1] [,2] 
[1,] 1 3 
[2,] 2 4 
, , 2 
[,1] [,2] 
[1,] 5 7 
[2,] 6 8 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 28 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding arrays 
While a matrix is a 2-dimensional table of like elements, an array is the 
generalization of matrices to n-dimensions. Stratified contingency tables in 
epidemiology are represented as array data objects in R. For example, the 
RCT previously shown comparing the number deaths among diabetic 
subjects that received tolbutamide vs. placebo is now also stratified by age 
group: 
Table : Deaths among subjects who received tolbutamide and placebo in the 
Unversity Group Diabetes Program (1970), stratifying by age 
Age55 Age55 Combined 
Tolb Plac Tolb Plac Tolb Plac 
Deaths 8 5 22 16 30 21 
Survivors 98 115 76 69 174 184 
Total 106 120 98 85 204 205 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 29 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding arrays 
 tdat - c(8, 98, 5, 115, 22, 76, 16, 69) 
 tdat - array(tdat, c(2, 2, 2)) 
 dimnames(tdat) - list(Outcome=c(Deaths, Survivors), 
+ Treatment=c(Tolbutamide, Placebo), 
+ Age group=c(Age55, Age=55)) 
 tdat 
, , Age group = Age55 
Treatment 
Outcome Tolbutamide Placebo 
Deaths 8 5 
Survivors 98 115 
, , Age group = Age=55 
Treatment 
Outcome Tolbutamide Placebo 
Deaths 22 16 
Survivors 76 69 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 30 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Table : Example of 4-dimensional array: Year 2000 population estimates by age, 
ethnicity, sex, and county 
Ethnicity 
County/Sex Age White AfrAmer AsianPI Latino Multirace AmerInd 
Alameda 
Female =19 58,160 31,765 40,653 49,738 10,120 839 
20–44 112,326 44,437 72,923 58,553 7,658 1,401 
45–64 82,205 24,948 33,236 18,534 2,922 822 
65+ 49,762 12,834 16,004 7,548 1,014 246 
Male =19 61,446 32,277 42,922 53,097 10,102 828 
20–44 115,745 36,976 69,053 69,233 6,795 1,263 
45–64 81,332 20,737 29,841 17,402 2,506 687 
65+ 33,994 8,087 11,855 5,416 711 156 
San Francisco 
Female =19 14,355 6,986 23,265 13,251 2,940 173 
20–44 85,766 10,284 52,479 23,458 3,656 526 
45–64 35,617 6,890 31,478 9,184 1,144 282 
65+ 27,215 5,172 23,044 5,773 554 121 
Male =19 14,881 6,959 24,541 14,480 2,851 165 
20–44 105,798 11,111 48,379 31,605 3,766 782 
45–64 43,694 7,352 26,404 8,674 1,220 354 
65+ 20,072 3,329 17,190 3,428 450 76 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 31 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding arrays 
Figure : Schematic representation of a 4-dimensional array: Year 2000 population 
estimates by age (1), race (2), sex (3), and county (4) 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 32 / 60
Working with R data objects Working with vectors, matrices,  arrays 
Understanding arrays 
Figure : Schematic of a theoretical 5-D array (e.g., data by age (1), race (2), sex 
(3), party affiliation (4), and state (5)). We can see that the field “state” has 3 
levels, and the field “party affiliation” has 2 levels; however, it is not apparent the 
number of age, race, and sex levels. Although not displayed, age levels would be 
represented by row names (along 1st dimension), race levels by column names 
(along 2nd dimension), and sex levels by depth names (along 3rd dimension). 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 33 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding lists 
Up to now, we have been working with atomic data objects (vector, matrix, 
array). In contrast, lists, data frames, and functions are recursive data 
objects. Recursive data objects have more flexibility in combining diverse 
data objects into one object. A list provides the most flexibility. Think of a 
list object as a collection of “bins” that can contain any R object. Lists 
are very useful for collecting results of an analysis or a function into one 
data object where all its contents are readily accessible by indexing. 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 34 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding lists 
A list is a collection of data objects without any restrictions: 
 x - c(11, 22, 34) 
 y - c(Male, Female, Male) 
 z - matrix(c(67, 34, 56,22), 2, 2) 
 mylist - list(x, y, z) 
 mylist 
[[1]] 
[1] 11 22 34 
[[2]] 
[1] Male Female Male 
[[3]] 
[,1] [,2] 
[1,] 67 56 
[2,] 34 22 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 35 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding lists 
Names can be assigned to each bin of a list. 
 names(mylist) - c(Age, Sex, Data) 
 mylist 
$Age 
[1] 11 22 34 
$Sex 
[1] Male Female Male 
$Data 
[,1] [,2] 
[1,] 67 56 
[2,] 34 22 
 mylist$Sex 
[1] Male Female Male 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 36 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding lists 
Figure : Schematic representation of a list of length four. The first bin [1] 
contains a smiling face [[1]], the second bin [2] contains a flower [[2]], the 
third bin [3] contains a lightning bolt [[3]], and the fourth bin [[4]] contains 
a heart [[4]]. When indexing a list object, single brackets [·] indexes the bin, 
and double brackets [[·]] indexes the bin contents. If the bin has a name, then 
$name also indexes the contents. 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 37 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding lists 
For example, using the UGDP clinical trial data, suppose we perform 
Fisher’s exact test for testing the null hypothesis of independence of rows 
and columns in a contingency table with fixed marginals. 
 udat - read.csv(http://www.medepi.net/data/ugdp.txt) 
 tab - xtabs(~ Status + Treatment, data = udat)[,2:1] 
 tab 
Treatment 
Status Tolbutamide Placebo 
Death 30 21 
Survivor 174 184 
 ftab - fisher.test(tab) 
 ftab 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 38 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding lists 
 ftab 
Fisher’s Exact Test for Count Data 
data: tab 
p-value = 0.1813 
alternative hypothesis: true odds ratio is not equal to 1 
95 percent confidence interval: 
0.8013768 2.8872863 
sample estimates: 
odds ratio 
1.509142 
The default display only shows partial results. The total results are stored 
in the object ftab. Let’s evaluate the structure of ftab and extract some 
results: 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 39 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding lists 
 str(ftab) 
List of 7 
$ p.value : num 0.181 
$ conf.int : atomic [1:2] 0.801 2.887 
..- attr(*, conf.level)= num 0.95 
$ estimate : Named num 1.51 
..- attr(*, names)= chr odds ratio 
$ null.value : Named num 1 
..- attr(*, names)= chr odds ratio 
$ alternative: chr two.sided 
$ method : chr Fisher’s Exact Test for Count Data 
$ data.name : chr tab 
- attr(*, class)= chr htest 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 40 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding lists 
Let’s index some of the bins from ftab. 
 ftab$estimate 
odds ratio 
1.5091 
 ftab$conf.int 
[1] 0.80138 2.88729 
 ftab$conf.int[2] 
[1] 2.887286 
attr(,conf.level) 
[1] 0.95 
 ftab$p.value 
[1] 0.18126 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 41 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding data frames 
A data frame is a list with a 2-dimensional (tabular) structure. 
Epidemiologists are very experienced working with data frames where each 
row usually represents data collected on individual subjects (also called 
records or observations) and columns represent fields for each type of data 
collected (also called variables). 
 subjno - c(1, 2, 3, 4) 
 age - c(34, 56, 45, 23) 
 sex - c(Male, Male, Female, Male) 
 case - c(Yes, No, No, Yes) 
 mydat - data.frame(subjno, age, sex, case) 
 mydat 
subjno age sex case 
1 1 34 Male Yes 
2 2 56 Male No 
3 3 45 Female No 
4 4 23 Male Yes 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 42 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding data frames 
Epidemiologists are familiar with tabular data sets where each row is a 
record and each column is a field. A record can be data collected on 
individuals or groups. We usually refer to the field name as a variable 
(e.g., age, gender, ethnicity). Fields can contain numeric or character 
data. In R, these types of data sets are handled by data frames. Each 
column of a data frame is usually either a factor or numeric vector, 
although it can have complex, character, or logical vectors. Data frames 
have the functionality of matrices and lists. For example, here is the first 
10 rows of the infert data set, a matched case-control study published in 
1976 that evaluated whether infertility was associated with prior 
spontaneous or induced abortions. 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 43 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding data frames 
 data(infert) 
 str(infert) 
‘data.frame’: 248 obs. of 8 variables: 
$ education : Factor w/ 3 levels 0-5yrs,..: 1 1 ... 
$ age : num NA 45 NA 23 35 36 23 32 21 28 ... 
$ parity : num 6 1 6 4 3 4 1 2 1 2 ... 
$ induced : num 1 1 2 2 1 2 0 0 0 0 ... 
$ case : num 1 1 1 1 1 1 1 1 1 1 ... 
$ spontaneous : num 2 0 0 0 1 1 0 0 1 0 ... 
$ stratum : int 1 2 3 4 5 6 7 8 9 10 ... 
$ pooled.stratum: num 3 1 4 2 32 36 6 22 5 19 ... 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 44 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding data frames 
 infert[1:10, 1:6] 
education age parity induced case spontaneous 
1 0-5yrs NA 6 1 1 2 
2 0-5yrs 45 1 1 1 0 
3 0-5yrs NA 6 2 1 0 
4 0-5yrs 23 4 2 1 0 
5 6-11yrs 35 3 1 1 1 
6 6-11yrs 36 4 2 1 1 
7 6-11yrs 23 1 0 1 0 
8 6-11yrs 32 2 0 1 0 
9 6-11yrs 21 1 0 1 1 
10 6-11yrs 28 2 0 1 0 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 45 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding data frames 
The fields are obviously vectors. Let’s explore a few of these vectors to see 
what we can learn about their structure in R. 
 #age variable 
 infert$age 
[1] 26 42 39 34 35 36 23 32 21 28 29 37 31 29 31 27 30 26 
... 
[235] 25 32 25 31 38 26 31 31 25 31 34 35 29 23 
 mode(infert$age) 
[1] numeric 
 class(infert$age) 
[1] numeric 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 46 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding data frames 
 # education variable 
 infert$education 
[1] 0-5yrs 0-5yrs 0-5yrs 0-5yrs 6-11yrs 6-11yrs 
... 
[247] 12+ yrs 12+ yrs 
Levels: 0-5yrs 6-11yrs 12+ yrs 
 mode(infert$education) 
[1] numeric 
 class(infert$education) 
[1] factor 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 47 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding data frames and factors 
A factor is R’s representation of categorical fields and keeps track of all 
possible category levels. 
 sex - sample(c(Male, Female), 100, replace = TRUE) 
 mode(sex); class(sex) 
[1] character 
[1] character 
 table(sex) 
sex 
Female Male 
51 49 
 sexf - factor(sex, levels = c(Male, Female, Transgender)) 
 table(sexf) 
sexf 
Male Female Transgender 
49 51 0 
 mode(sexf); class(sexf) 
[1] numeric 
[1] factor 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 48 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding data frames and lists 
Infert data is a matched case-control study evaluating the association of 
history of abortions and infertility. Use conditional logistic regression. 
 mod3 - clogit(case ~ spontaneous + induced + 
+ strata(stratum), data = infert) 
 mod3 
Call: 
clogit(case ~ spontaneous + induced + strata(stratum), data = 
coef exp(coef) se(coef) z p 
spontaneous 1.99 7.29 0.352 5.63 1.8e-08 
induced 1.41 4.09 0.361 3.91 9.4e-05 
 summod3 - summary(mod3) 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 49 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding data frames and lists 
 summod3 
n= 248 
coef exp(coef) se(coef) z Pr(|z|) 
spontaneous 1.9859 7.2854 0.3524 5.635 1.75e-08 *** 
induced 1.4090 4.0919 0.3607 3.906 9.38e-05 *** 
--- 
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 
exp(coef) exp(-coef) lower .95 upper .95 
spontaneous 7.285 0.1373 3.651 14.536 
induced 4.092 0.2444 2.018 8.298 
Rsquare= 0.193 (max possible= 0.519 ) 
Likelihood ratio test= 53.15 on 2 df, p=2.869e-12 
Wald test = 31.84 on 2 df, p=1.221e-07 
Score (logrank) test = 48.44 on 2 df, p=3.032e-11 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 50 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding data frames and lists 
 str(summod3) 
List of 12 
$ call : language coxph(formula = Surv(rep(1, 248L), case) ~ spontaneous 
$ fail : NULL 
$ na.action : NULL 
$ n : int 248 
$ loglik : num [1:2] -90.8 -64.2 
$ coefficients: num [1:2, 1:5] 1.986 1.409 7.285 4.092 0.352 ... 
..- attr(*, dimnames)=List of 2 
.. ..$ : chr [1:2] spontaneous induced 
.. ..$ : chr [1:5] coef exp(coef) se(coef) z ... 
$ conf.int : num [1:2, 1:4] 7.285 4.092 0.137 0.244 3.651 ... 
..- attr(*, dimnames)=List of 2 
.. ..$ : chr [1:2] spontaneous induced 
.. ..$ : chr [1:4] exp(coef) exp(-coef) lower .95 upper .95 
$ logtest : Named num [1:3] 5.32e+01 2.00 2.87e-12 
... [output truncated] 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 51 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding data frame and lists 
 summod3$coef 
coef exp(coef) se(coef) z Pr(|z|) 
spontaneous 1.985876 7.285423 0.3524435 5.634592 1.754734e-08 
induced 1.409012 4.091909 0.3607124 3.906191 9.376245e-05 
 summod3$coef[1, ] 
coef exp(coef) se(coef) z Pr(|z|) 
1.985876e+00 7.285423e+00 3.524435e-01 5.634592e+00 1.754734e-08 
 summod3$coef[ ,2] 
spontaneous induced 
7.285423 4.091909 
 summod3$coef[1,2] 
[1] 7.285423 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 52 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding functions 
Risk Ratio confidence interval from baby Rothman, p. 135 
rr.wald - function(x, conf.level = 0.95){ 
## prepare input 
x1 - x[1,1]; n1 - sum(x[1,]) 
x0 - x[2,1]; n0 - sum(x[2,]) 
## do calculations 
p1 - x1/n1 ##risk among exposed 
p0 - x0/n0 ##risk among unexposed 
RR - p1/p0; 
logRR - log(RR) 
SElogRR - sqrt(1/x1 - 1/n1 + 1/x0 - 1/n0) 
Z - qnorm(0.5*(1 + conf.level)) 
LCL - exp(logRR - Z*SElogRR) 
UCL - exp(logRR + Z*SElogRR) 
##collect output 
list(x = x, risks = c(p1 = p1, p0 = p0), risk.ratio = RR, 
conf.int = c(LCL, UCL), conf.level = conf.level) 
} 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 53 / 60
Working with R data objects Working with lists, data frames, and functions 
Understanding functions 
Run rr.wald function on UGDP RCT data (results displayed in 2 
columns). 
 tab 
Treatment 
Status Tolbutamide Placebo 
Death 30 21 
Survivor 174 184 
 rr.wald(tab) 
$x 
Treatment 
Status Tolbutamide Placebo 
Death 30 21 
Survivor 174 184 
$risks 
p1 p0 
0.5882353 0.4860335 
$risk.ratio 
[1] 1.210277 
$conf.int 
[1] 0.9396227 1.5588927 
$conf.level 
[1] 0.95 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 54 / 60
Working with R data objects Working with lists, data frames, and functions 
The epitools package 
The following epidemiologists, directly or indirectly, contributed to 
’epitools’: 
Tom´as Arag´on, MD, DrPH, , UC Berkeley 
Michael P. Fay, PhD, Mathematical Statistician National Institute of 
Allergy and Infectious Diseases 
Wayne Enanoria, PhD, MPH, UC Berkeley 
Travis Porco, PhD, MPH, UC San Francisco 
Michael Samuel, DrPH, California Department of Public Health 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 55 / 60
Working with R data objects Working with lists, data frames, and functions 
Using epitools for outbreak investigations 
Using the epitab function (only arguments are displayed); 
epitab(x, y = NULL, 
method = c(oddsratio, riskratio, rateratio), 
conf.level = 0.95, 
rev = c(neither, rows, columns, both), 
oddsratio = c(wald, fisher, midp, small), 
riskratio = c(wald, boot, small), 
rateratio = c(wald, midp), 
pvalue = c(fisher.exact, midp.exact, chi2), 
correction = FALSE, 
verbose = FALSE) 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 56 / 60
Working with R data objects Working with lists, data frames, and functions 
Hypothesis testing using Oswego: Passing 2 vectors 
 library(epitools) #load ’epitools’ package 
 data(oswego) #load Oswego dataset 
 attach(oswego) #attach dataset 
 round(epitab(jello, ill, method = riskratio)$tab, 2) 
Outcome 
Predictor N p0 Y p1 riskratio lower upper p.value 
N 22 0.42 30 0.58 1.00 NA NA NA 
Y 7 0.30 16 0.70 1.21 0.84 1.72 0.44 
 round(epitab(jello, ill, method = oddsratio)$tab, 2) 
Outcome 
Predictor N p0 Y p1 oddsratio lower upper p.value 
N 22 0.76 30 0.65 1.00 NA NA NA 
Y 7 0.24 16 0.35 1.68 0.59 4.76 0.44 
 detach(oswego) #detach dataset 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 57 / 60
Working with R data objects Working with lists, data frames, and functions 
Hypothesis testing using Oswego: Passing a table 
 jello.tab1 
ill 
jello N Y 
N 22 30 
Y 7 16 
 round(epitab(jello.tab1)$tab, 2) 
ill 
jello N p0 Y p1 oddsratio lower upper p.value 
N 22 0.76 30 0.65 1.00 NA NA NA 
Y 7 0.24 16 0.35 1.68 0.59 4.76 0.44 
 round(epitab(jello.tab1, method = risk)$tab, 2) 
ill 
jello N p0 Y p1 riskratio lower upper p.value 
N 22 0.42 30 0.58 1.00 NA NA NA 
Y 7 0.30 16 0.70 1.21 0.84 1.72 0.44 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 58 / 60
Working with R data objects Working with lists, data frames, and functions 
Hypothesis testing using Oswego: Passing one vector 
 round(epitab(c(22, 30, 7, 16))$tab, 2) 
Outcome 
Predictor Disease1 p0 Disease2 p1 oddsratio lower upper p.value 
Exposed1 22 0.76 30 0.65 1.00 NA NA NA 
Exposed2 7 0.24 16 0.35 1.68 0.59 4.76 0.44 
 round(epitab(c(22, 30, 7, 16), method = risk)$tab, 2) 
Outcome 
Predictor Disease1 p0 Disease2 p1 riskratio lower upper p.value 
Exposed1 22 0.42 30 0.58 1.00 NA NA NA 
Exposed2 7 0.30 16 0.70 1.21 0.84 1.72 0.44 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 59 / 60
Working with R data objects Working with lists, data frames, and functions 
Summary 
1 Background 
Cost 
Quality 
Community 
2 Getting started with R 
Full-function calculator/spreadsheet 
Extensible statistical packages 
High quality graphics tool 
Multi-use programming language 
3 Working with R data objects 
Atomic vs. recursive data objects 
Working with vectors, matrices,  arrays 
Working with lists, data frames, and functions 
Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 60 / 60

Contenu connexe

Similaire à Understanding R for Epidemiologists

Comparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptxComparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptx
PremaGanesh1
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
DataMine Lab
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
templedf
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
vini89
 

Similaire à Understanding R for Epidemiologists (20)

R Basics
R BasicsR Basics
R Basics
 
Comparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptxComparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptx
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
 
4 Descriptive Statistics with R
4 Descriptive Statistics with R4 Descriptive Statistics with R
4 Descriptive Statistics with R
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometrics
 
Optimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimationOptimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimation
 
Naive.pdf
Naive.pdfNaive.pdf
Naive.pdf
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 
R language
R languageR language
R language
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSS
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
Using r
Using rUsing r
Using r
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Recommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRecommender Systems in the Linked Data era
Recommender Systems in the Linked Data era
 
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATADETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
 

Plus de Tomas J. Aragon

Continuous Decision Improvement: Decisive Leadership for Complex Environments
Continuous Decision Improvement: Decisive Leadership for Complex EnvironmentsContinuous Decision Improvement: Decisive Leadership for Complex Environments
Continuous Decision Improvement: Decisive Leadership for Complex Environments
Tomas J. Aragon
 

Plus de Tomas J. Aragon (19)

Economic Burden of Alcohol Consumption in the City and County of San Francisc...
Economic Burden of Alcohol Consumption in the City and County of San Francisc...Economic Burden of Alcohol Consumption in the City and County of San Francisc...
Economic Burden of Alcohol Consumption in the City and County of San Francisc...
 
Racial Health Inequities in San Francisco, CA
Racial Health Inequities in San Francisco, CARacial Health Inequities in San Francisco, CA
Racial Health Inequities in San Francisco, CA
 
The Leading Population Health Framework
The Leading Population Health FrameworkThe Leading Population Health Framework
The Leading Population Health Framework
 
What is population health?
What is population health?What is population health?
What is population health?
 
PDSA Problem-Solving
PDSA Problem-SolvingPDSA Problem-Solving
PDSA Problem-Solving
 
Population Health Lean (SLIDES)
Population Health Lean (SLIDES)Population Health Lean (SLIDES)
Population Health Lean (SLIDES)
 
Population Health Lean: An overview
Population Health Lean: An overviewPopulation Health Lean: An overview
Population Health Lean: An overview
 
Structural Trauma and Toxic Stress: Lifecourse Roots of Health Inequities
Structural Trauma and Toxic Stress: Lifecourse Roots of Health InequitiesStructural Trauma and Toxic Stress: Lifecourse Roots of Health Inequities
Structural Trauma and Toxic Stress: Lifecourse Roots of Health Inequities
 
Structural Trauma and Toxic Stress! Inter-generational Roots of Adults Health...
Structural Trauma and Toxic Stress! Inter-generational Roots of Adults Health...Structural Trauma and Toxic Stress! Inter-generational Roots of Adults Health...
Structural Trauma and Toxic Stress! Inter-generational Roots of Adults Health...
 
Leading population health---A results-based lean approach
Leading population health---A results-based lean approachLeading population health---A results-based lean approach
Leading population health---A results-based lean approach
 
Toxic Stress! Childhood Roots of Health Inequities:
Toxic Stress! Childhood Roots of Health Inequities:Toxic Stress! Childhood Roots of Health Inequities:
Toxic Stress! Childhood Roots of Health Inequities:
 
Continuous Decision Improvement: Decisive Leadership for Complex Environments
Continuous Decision Improvement: Decisive Leadership for Complex EnvironmentsContinuous Decision Improvement: Decisive Leadership for Complex Environments
Continuous Decision Improvement: Decisive Leadership for Complex Environments
 
Continuous Decision Improvement (CDI): Public Health Decision Making for Comp...
Continuous Decision Improvement (CDI): Public Health Decision Making for Comp...Continuous Decision Improvement (CDI): Public Health Decision Making for Comp...
Continuous Decision Improvement (CDI): Public Health Decision Making for Comp...
 
POSTER: Designing Learning Health Organization for Collective Impact Using REACH
POSTER: Designing Learning Health Organization for Collective Impact Using REACHPOSTER: Designing Learning Health Organization for Collective Impact Using REACH
POSTER: Designing Learning Health Organization for Collective Impact Using REACH
 
Designing a Learning Health Organization for Collective Impact
Designing a Learning Health Organization for Collective ImpactDesigning a Learning Health Organization for Collective Impact
Designing a Learning Health Organization for Collective Impact
 
Curriculum vitae (LaTeX PDF)
Curriculum vitae (LaTeX PDF)Curriculum vitae (LaTeX PDF)
Curriculum vitae (LaTeX PDF)
 
The High Achieving Governmental Health Department in 2020 as the Community Ch...
The High Achieving Governmental Health Department in 2020 as the Community Ch...The High Achieving Governmental Health Department in 2020 as the Community Ch...
The High Achieving Governmental Health Department in 2020 as the Community Ch...
 
Preparing for Microbial Threats to Health: What Every Professional Should Know
Preparing for Microbial Threats to Health: What Every Professional Should KnowPreparing for Microbial Threats to Health: What Every Professional Should Know
Preparing for Microbial Threats to Health: What Every Professional Should Know
 
Sugar MADNESS: How metabolic syndrome drives obesity and what you can do abou...
Sugar MADNESS: How metabolic syndrome drives obesity and what you can do abou...Sugar MADNESS: How metabolic syndrome drives obesity and what you can do abou...
Sugar MADNESS: How metabolic syndrome drives obesity and what you can do abou...
 

Dernier

Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
adilkhan87451
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...
Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...
Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
 
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
 
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
 
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
 
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
 
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
 

Understanding R for Epidemiologists

  • 1. Understanding R for Epidemiologists Tom´as J. Arag´on, MD, DrPH Faculty, Division of Epidemiology UC Berkeley School of Public Health Health Officer, City & County of San Francisco Director, Population Health Division (PHD) San Francisco Department of Public Health Blog: http://www.medepi.com Email: aragon@berkeley.edu September 8, 2014 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 1 / 60
  • 2. Outline 1 Background Cost Quality Community 2 Getting started with R Full-function calculator/spreadsheet Extensible statistical packages High quality graphics tool Multi-use programming language 3 Working with R data objects Atomic vs. recursive data objects Working with vectors, matrices, & arrays Working with lists, data frames, and functions Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 2 / 60
  • 3. Background Background: Major issues Cost Quality Community Functionality Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 3 / 60
  • 4. Background Cost Cost: Open Source vs. Proprietary Software Costs of software Costs of multi-platforms Costs of education and training Costs of adding solutions (e.g., packages) Costs of solving problems and sharing solutions Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 4 / 60
  • 5. Background Quality Quality: Open Source vs. Proprietary Software Core Development Team Large pool of users/testers Quality control process for packages Bug fixes based on need/demand, not profits Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 5 / 60
  • 6. Background Community Community: Open Source vs. Proprietary Software Large community of users Transparent development process Growing number of books and trainings Growing number of free tutorials and manuals Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 6 / 60
  • 7. Background Community Current R contributors Douglas Bates John Chambers Peter Dalgaard Seth Falcon Robert Gentleman Kurt Hornik Stefano Iacus Ross Ihaka Friedrich Leisch Uwe Ligges Thomas Lumley Martin Maechler Duncan Murdoch Paul Murrell Martyn Plummer Brian Ripley Deepayan Sarkar Duncan Temple Lang Luke Tierney Simon Urbanek Source: http://www.r-project.org/contributors.html Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 7 / 60
  • 8. Getting started with R What is R? Full-function calculator/spreadsheet Extensible statistical packages High-quality graphics tool Multi-use programming language Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 8 / 60
  • 9. Getting started with R Full-function calculator/spreadsheet Full-function calculator: Selected math operators Operator Description Try these examples + addition 5+4 − subtraction 5-4 multiplication 5*4 / division 5/4 ˆ exponentiation 5^4 − unary minus (change current sign) -5 abs absolute value abs(-23) exp exponentiation (e to a power) exp(8) log logarithm (default is natural log) log(exp(8)) sqrt square root sqrt(64) %/% integer divide 10%/%3 %% modulus 10%%3 %*% matrix multiplication xx - matrix(1:4, 2, 2) xx%*%c(1, 1) c(1, 1)%*%xx Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 9 / 60
  • 10. Getting started with R Extensible statistical packages Extensible statistical packages Generalized Linear Models (Base) Linear regression Logistic regression Poisson regression Cox Proportional Hazard models (Survival) Cox PH regression Conditional logistic regression (matched case-control studies) Meta-analysis (meta) Complex survey analysis (survey) Epidemiology packages epitools epicalc epibasix epiR Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 10 / 60
  • 11. Getting started with R High quality graphics tool Graphics display of sample size curves Alternative distribution H1 Power (1 - b) Null distribution H0 b a 2 -Z1-a 2 m0 Z1-a 2 m1 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 11 / 60
  • 12. Getting started with R High quality graphics tool Graphics display of P value function 0.2 0.5 1.0 2.0 2.9 5.0 10.0 20.0 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0 0 10 20 30 40 50 60 70 80 90 95 100 Confidence level (%) Rate Ratio P−value Null hypothesis Median unbiased estimate 95% Lower Confidence Limit = 0.74 95% Upper Confidence Limit = 21.0 95% Confidence Interval Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 12 / 60
  • 13. Getting started with R High quality graphics tool Graphical display of multiple linear regression 0 10 20 30 40 50 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 x1 x2 y Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 13 / 60
  • 14. Getting started with R High quality graphics tool Epidemic curve using Color Brewer colors Unknown WNF WNND 0 20 40 60 80 West Nile Virus Human Cases Reported in California by Disease Week as of December 14, 2004 Cases + Bird 2/24 + Horse 6/20 + Chicken 5/17 + Mosquito 4/14 52 03 06 09 12 15 18 21 24 27 30 33 36 39 42 45 48 51 Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Disease Week Calendar Month, 2004 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 14 / 60
  • 15. Getting started with R Multi-use programming language Multi-use programming language Vectorized computations Functional programming language Object-oriented programming Text processing (e.g., using regular expressions) Links to C, Fortran, etc. Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 15 / 60
  • 16. Working with R data objects Atomic vs. recursive data objects Data objects in R Object types Vector Matrix Array List Data frame Function Operations Create Name Index Replace Manipulate Do computations Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 16 / 60
  • 17. Working with R data objects Atomic vs. recursive data objects Summary of types of data objects in R Data object Possible modea Default class Atomic vector character, numeric, logical NULL matrix character, numeric, logical NULL array character, numeric, logical NULL Recursive list list NULL data frame list data frame function function NULL a We are ignoring complex numbers Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 17 / 60
  • 18. Working with R data objects Working with vectors, matrices, arrays Understanding vectors A vector is a collection of like elements without dimensions1. The vector elements are all of the same mode (either character, numeric, or logical). y - c(Pedro, Paulo, Maria) y [1] Pedro Paulo Maria x - c(1, 2, 3, 4, 5) x [1] 1 2 3 4 5 x 3 [1] TRUE TRUE FALSE FALSE FALSE 1In other programming languages, vectors are either row vectors or column vectors. R does not make this distinction until it is necessary. Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 18 / 60
  • 19. Working with R data objects Working with vectors, matrices, arrays Understanding vectors: Indexing Indexing by Try these examples Position x - c(chol=234, sbp=148, dbp=78, age=54) x[2] #positions to include x[c(2, 3)] x[-c(1, 3, 4)] #positions to exclude x[-c(1, 4)] Name x[sbp] x[c(sbp, dbp)] Logical x 100 x[x 100] (x 150) (x 70) bp - (x 150) (x 70) x[bp] Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 19 / 60
  • 20. Working with R data objects Working with vectors, matrices, arrays Understanding vectors: Replacement Replacing by Try these examples Position x - c(chol=234, sbp=148, dbp=78, age=54) x[1] x[1] - 250 x Name x[sbp] x[sbp] - 150 x Logical x[x100] x[x100] - NA x Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 20 / 60
  • 21. Working with R data objects Working with vectors, matrices, arrays Understanding vectors: Replacement x - c(chol = 234, sbp = 148, dbp = 78, age = 54) x[1] - 250 #by position x chol sbp dbp age 250 148 78 54 x[sbp] - 150 #by name x chol sbp dbp age 250 150 78 54 x[x100] dbp age 78 54 x[x100] - NA #by logical x chol sbp dbp age 250 150 NA NA Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 21 / 60
  • 22. Working with R data objects Working with vectors, matrices, arrays Understanding matrices A matrix is a collection of like elements organized into a 2-dimensional (tabular) data object. Matrix elements can be either numeric, character, or logical. We can think of a matrix as a vector with a 2-dimensional structure. Contingency tables in epidemiology are represented in R as numeric matrices or arrays. An array is the generalization of matrices to 3 or more dimensions (commonly known as stratified tables). We cover arrays later, for now we will focus on 2-dimensional tables. Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 22 / 60
  • 23. Working with R data objects Working with vectors, matrices, arrays Understanding matrices When R returns a matrix the [n,] indicates the nth row and [,m] indicates the mth column. x - c(a, b, c, d) y - matrix(x, 2, 2) y [,1] [,2] [1,] a c [2,] b d y[1,] [1] a c y[,2] [1] c d Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 23 / 60
  • 24. Working with R data objects Working with vectors, matrices, arrays Understanding matrices x - c(30, 21, 170, 180) # creating y - matrix(x, 2, 2, byrow = TRUE) # creating y [,1] [,2] [1,] 30 21 [2,] 170 180 rownames(y) - c(Deaths, Survivors) # naming colnames(y) - c(Tolbutamide, Placebo) # naming y[2, 1] - 174 # replace by position y[Survivors, Placebo] - 184 # replace by name y Tolbutamide Placebo Deaths 30 21 Survivors 174 184 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 24 / 60
  • 25. Working with R data objects Working with vectors, matrices, arrays Understanding matrices Consider the 2 × 2 table of crude data in Table. In this randomized clinical trial (RCT), diabetic subjects were randomly assigned to receive either tolbutamide, an oral hypoglycemic drug, or placebo. Because this was a prospective study we can calculate risks, odds, a risk ratio, and an odds ratio. We will do this using R as a calculator. Table : Deaths among subjects who received tolbutamide and placebo in the Unversity Group Diabetes Program (1970) Tolbutamide Placebo Deaths 30 21 Survivors 174 184 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 25 / 60
  • 26. Working with R data objects Working with vectors, matrices, arrays Understanding matrices dat - matrix(c(30, 174, 21, 184), 2, 2) rownames(dat) - c(Deaths, Survivors) colnames(dat) - c(Tolbutamide, Placebo) coltot - apply(dat, 2, sum) #column totals risks - dat[Deaths,]/coltot risk.ratio - risks/risks[2] #risk ratio odds - risks/(1-risks) odds.ratio - odds/odds[2] #odds ratio Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 26 / 60
  • 27. Working with R data objects Working with vectors, matrices, arrays Understanding matrices # display results dat Tolbutamide Placebo Deaths 30 21 Survivors 174 184 rbind(risks, risk.ratio, odds, odds.ratio) Tolbutamide Placebo risks 0.1470588 0.1024390 risk.ratio 1.4355742 1.0000000 odds 0.1724138 0.1141304 odds.ratio 1.5106732 1.0000000 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 27 / 60
  • 28. Working with R data objects Working with vectors, matrices, arrays Understanding arrays An array is a collection of like elements organized into a n-dimensional data object. When R returns an array the [n,,] indicates the nth row and [,m,] indicates the mth column, and so on. x - 1:8 y - array(x, dim=c(2, 2, 2)) y , , 1 [,1] [,2] [1,] 1 3 [2,] 2 4 , , 2 [,1] [,2] [1,] 5 7 [2,] 6 8 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 28 / 60
  • 29. Working with R data objects Working with vectors, matrices, arrays Understanding arrays While a matrix is a 2-dimensional table of like elements, an array is the generalization of matrices to n-dimensions. Stratified contingency tables in epidemiology are represented as array data objects in R. For example, the RCT previously shown comparing the number deaths among diabetic subjects that received tolbutamide vs. placebo is now also stratified by age group: Table : Deaths among subjects who received tolbutamide and placebo in the Unversity Group Diabetes Program (1970), stratifying by age Age55 Age55 Combined Tolb Plac Tolb Plac Tolb Plac Deaths 8 5 22 16 30 21 Survivors 98 115 76 69 174 184 Total 106 120 98 85 204 205 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 29 / 60
  • 30. Working with R data objects Working with vectors, matrices, arrays Understanding arrays tdat - c(8, 98, 5, 115, 22, 76, 16, 69) tdat - array(tdat, c(2, 2, 2)) dimnames(tdat) - list(Outcome=c(Deaths, Survivors), + Treatment=c(Tolbutamide, Placebo), + Age group=c(Age55, Age=55)) tdat , , Age group = Age55 Treatment Outcome Tolbutamide Placebo Deaths 8 5 Survivors 98 115 , , Age group = Age=55 Treatment Outcome Tolbutamide Placebo Deaths 22 16 Survivors 76 69 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 30 / 60
  • 31. Working with R data objects Working with vectors, matrices, arrays Table : Example of 4-dimensional array: Year 2000 population estimates by age, ethnicity, sex, and county Ethnicity County/Sex Age White AfrAmer AsianPI Latino Multirace AmerInd Alameda Female =19 58,160 31,765 40,653 49,738 10,120 839 20–44 112,326 44,437 72,923 58,553 7,658 1,401 45–64 82,205 24,948 33,236 18,534 2,922 822 65+ 49,762 12,834 16,004 7,548 1,014 246 Male =19 61,446 32,277 42,922 53,097 10,102 828 20–44 115,745 36,976 69,053 69,233 6,795 1,263 45–64 81,332 20,737 29,841 17,402 2,506 687 65+ 33,994 8,087 11,855 5,416 711 156 San Francisco Female =19 14,355 6,986 23,265 13,251 2,940 173 20–44 85,766 10,284 52,479 23,458 3,656 526 45–64 35,617 6,890 31,478 9,184 1,144 282 65+ 27,215 5,172 23,044 5,773 554 121 Male =19 14,881 6,959 24,541 14,480 2,851 165 20–44 105,798 11,111 48,379 31,605 3,766 782 45–64 43,694 7,352 26,404 8,674 1,220 354 65+ 20,072 3,329 17,190 3,428 450 76 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 31 / 60
  • 32. Working with R data objects Working with vectors, matrices, arrays Understanding arrays Figure : Schematic representation of a 4-dimensional array: Year 2000 population estimates by age (1), race (2), sex (3), and county (4) Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 32 / 60
  • 33. Working with R data objects Working with vectors, matrices, arrays Understanding arrays Figure : Schematic of a theoretical 5-D array (e.g., data by age (1), race (2), sex (3), party affiliation (4), and state (5)). We can see that the field “state” has 3 levels, and the field “party affiliation” has 2 levels; however, it is not apparent the number of age, race, and sex levels. Although not displayed, age levels would be represented by row names (along 1st dimension), race levels by column names (along 2nd dimension), and sex levels by depth names (along 3rd dimension). Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 33 / 60
  • 34. Working with R data objects Working with lists, data frames, and functions Understanding lists Up to now, we have been working with atomic data objects (vector, matrix, array). In contrast, lists, data frames, and functions are recursive data objects. Recursive data objects have more flexibility in combining diverse data objects into one object. A list provides the most flexibility. Think of a list object as a collection of “bins” that can contain any R object. Lists are very useful for collecting results of an analysis or a function into one data object where all its contents are readily accessible by indexing. Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 34 / 60
  • 35. Working with R data objects Working with lists, data frames, and functions Understanding lists A list is a collection of data objects without any restrictions: x - c(11, 22, 34) y - c(Male, Female, Male) z - matrix(c(67, 34, 56,22), 2, 2) mylist - list(x, y, z) mylist [[1]] [1] 11 22 34 [[2]] [1] Male Female Male [[3]] [,1] [,2] [1,] 67 56 [2,] 34 22 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 35 / 60
  • 36. Working with R data objects Working with lists, data frames, and functions Understanding lists Names can be assigned to each bin of a list. names(mylist) - c(Age, Sex, Data) mylist $Age [1] 11 22 34 $Sex [1] Male Female Male $Data [,1] [,2] [1,] 67 56 [2,] 34 22 mylist$Sex [1] Male Female Male Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 36 / 60
  • 37. Working with R data objects Working with lists, data frames, and functions Understanding lists Figure : Schematic representation of a list of length four. The first bin [1] contains a smiling face [[1]], the second bin [2] contains a flower [[2]], the third bin [3] contains a lightning bolt [[3]], and the fourth bin [[4]] contains a heart [[4]]. When indexing a list object, single brackets [·] indexes the bin, and double brackets [[·]] indexes the bin contents. If the bin has a name, then $name also indexes the contents. Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 37 / 60
  • 38. Working with R data objects Working with lists, data frames, and functions Understanding lists For example, using the UGDP clinical trial data, suppose we perform Fisher’s exact test for testing the null hypothesis of independence of rows and columns in a contingency table with fixed marginals. udat - read.csv(http://www.medepi.net/data/ugdp.txt) tab - xtabs(~ Status + Treatment, data = udat)[,2:1] tab Treatment Status Tolbutamide Placebo Death 30 21 Survivor 174 184 ftab - fisher.test(tab) ftab Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 38 / 60
  • 39. Working with R data objects Working with lists, data frames, and functions Understanding lists ftab Fisher’s Exact Test for Count Data data: tab p-value = 0.1813 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.8013768 2.8872863 sample estimates: odds ratio 1.509142 The default display only shows partial results. The total results are stored in the object ftab. Let’s evaluate the structure of ftab and extract some results: Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 39 / 60
  • 40. Working with R data objects Working with lists, data frames, and functions Understanding lists str(ftab) List of 7 $ p.value : num 0.181 $ conf.int : atomic [1:2] 0.801 2.887 ..- attr(*, conf.level)= num 0.95 $ estimate : Named num 1.51 ..- attr(*, names)= chr odds ratio $ null.value : Named num 1 ..- attr(*, names)= chr odds ratio $ alternative: chr two.sided $ method : chr Fisher’s Exact Test for Count Data $ data.name : chr tab - attr(*, class)= chr htest Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 40 / 60
  • 41. Working with R data objects Working with lists, data frames, and functions Understanding lists Let’s index some of the bins from ftab. ftab$estimate odds ratio 1.5091 ftab$conf.int [1] 0.80138 2.88729 ftab$conf.int[2] [1] 2.887286 attr(,conf.level) [1] 0.95 ftab$p.value [1] 0.18126 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 41 / 60
  • 42. Working with R data objects Working with lists, data frames, and functions Understanding data frames A data frame is a list with a 2-dimensional (tabular) structure. Epidemiologists are very experienced working with data frames where each row usually represents data collected on individual subjects (also called records or observations) and columns represent fields for each type of data collected (also called variables). subjno - c(1, 2, 3, 4) age - c(34, 56, 45, 23) sex - c(Male, Male, Female, Male) case - c(Yes, No, No, Yes) mydat - data.frame(subjno, age, sex, case) mydat subjno age sex case 1 1 34 Male Yes 2 2 56 Male No 3 3 45 Female No 4 4 23 Male Yes Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 42 / 60
  • 43. Working with R data objects Working with lists, data frames, and functions Understanding data frames Epidemiologists are familiar with tabular data sets where each row is a record and each column is a field. A record can be data collected on individuals or groups. We usually refer to the field name as a variable (e.g., age, gender, ethnicity). Fields can contain numeric or character data. In R, these types of data sets are handled by data frames. Each column of a data frame is usually either a factor or numeric vector, although it can have complex, character, or logical vectors. Data frames have the functionality of matrices and lists. For example, here is the first 10 rows of the infert data set, a matched case-control study published in 1976 that evaluated whether infertility was associated with prior spontaneous or induced abortions. Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 43 / 60
  • 44. Working with R data objects Working with lists, data frames, and functions Understanding data frames data(infert) str(infert) ‘data.frame’: 248 obs. of 8 variables: $ education : Factor w/ 3 levels 0-5yrs,..: 1 1 ... $ age : num NA 45 NA 23 35 36 23 32 21 28 ... $ parity : num 6 1 6 4 3 4 1 2 1 2 ... $ induced : num 1 1 2 2 1 2 0 0 0 0 ... $ case : num 1 1 1 1 1 1 1 1 1 1 ... $ spontaneous : num 2 0 0 0 1 1 0 0 1 0 ... $ stratum : int 1 2 3 4 5 6 7 8 9 10 ... $ pooled.stratum: num 3 1 4 2 32 36 6 22 5 19 ... Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 44 / 60
  • 45. Working with R data objects Working with lists, data frames, and functions Understanding data frames infert[1:10, 1:6] education age parity induced case spontaneous 1 0-5yrs NA 6 1 1 2 2 0-5yrs 45 1 1 1 0 3 0-5yrs NA 6 2 1 0 4 0-5yrs 23 4 2 1 0 5 6-11yrs 35 3 1 1 1 6 6-11yrs 36 4 2 1 1 7 6-11yrs 23 1 0 1 0 8 6-11yrs 32 2 0 1 0 9 6-11yrs 21 1 0 1 1 10 6-11yrs 28 2 0 1 0 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 45 / 60
  • 46. Working with R data objects Working with lists, data frames, and functions Understanding data frames The fields are obviously vectors. Let’s explore a few of these vectors to see what we can learn about their structure in R. #age variable infert$age [1] 26 42 39 34 35 36 23 32 21 28 29 37 31 29 31 27 30 26 ... [235] 25 32 25 31 38 26 31 31 25 31 34 35 29 23 mode(infert$age) [1] numeric class(infert$age) [1] numeric Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 46 / 60
  • 47. Working with R data objects Working with lists, data frames, and functions Understanding data frames # education variable infert$education [1] 0-5yrs 0-5yrs 0-5yrs 0-5yrs 6-11yrs 6-11yrs ... [247] 12+ yrs 12+ yrs Levels: 0-5yrs 6-11yrs 12+ yrs mode(infert$education) [1] numeric class(infert$education) [1] factor Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 47 / 60
  • 48. Working with R data objects Working with lists, data frames, and functions Understanding data frames and factors A factor is R’s representation of categorical fields and keeps track of all possible category levels. sex - sample(c(Male, Female), 100, replace = TRUE) mode(sex); class(sex) [1] character [1] character table(sex) sex Female Male 51 49 sexf - factor(sex, levels = c(Male, Female, Transgender)) table(sexf) sexf Male Female Transgender 49 51 0 mode(sexf); class(sexf) [1] numeric [1] factor Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 48 / 60
  • 49. Working with R data objects Working with lists, data frames, and functions Understanding data frames and lists Infert data is a matched case-control study evaluating the association of history of abortions and infertility. Use conditional logistic regression. mod3 - clogit(case ~ spontaneous + induced + + strata(stratum), data = infert) mod3 Call: clogit(case ~ spontaneous + induced + strata(stratum), data = coef exp(coef) se(coef) z p spontaneous 1.99 7.29 0.352 5.63 1.8e-08 induced 1.41 4.09 0.361 3.91 9.4e-05 summod3 - summary(mod3) Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 49 / 60
  • 50. Working with R data objects Working with lists, data frames, and functions Understanding data frames and lists summod3 n= 248 coef exp(coef) se(coef) z Pr(|z|) spontaneous 1.9859 7.2854 0.3524 5.635 1.75e-08 *** induced 1.4090 4.0919 0.3607 3.906 9.38e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 exp(coef) exp(-coef) lower .95 upper .95 spontaneous 7.285 0.1373 3.651 14.536 induced 4.092 0.2444 2.018 8.298 Rsquare= 0.193 (max possible= 0.519 ) Likelihood ratio test= 53.15 on 2 df, p=2.869e-12 Wald test = 31.84 on 2 df, p=1.221e-07 Score (logrank) test = 48.44 on 2 df, p=3.032e-11 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 50 / 60
  • 51. Working with R data objects Working with lists, data frames, and functions Understanding data frames and lists str(summod3) List of 12 $ call : language coxph(formula = Surv(rep(1, 248L), case) ~ spontaneous $ fail : NULL $ na.action : NULL $ n : int 248 $ loglik : num [1:2] -90.8 -64.2 $ coefficients: num [1:2, 1:5] 1.986 1.409 7.285 4.092 0.352 ... ..- attr(*, dimnames)=List of 2 .. ..$ : chr [1:2] spontaneous induced .. ..$ : chr [1:5] coef exp(coef) se(coef) z ... $ conf.int : num [1:2, 1:4] 7.285 4.092 0.137 0.244 3.651 ... ..- attr(*, dimnames)=List of 2 .. ..$ : chr [1:2] spontaneous induced .. ..$ : chr [1:4] exp(coef) exp(-coef) lower .95 upper .95 $ logtest : Named num [1:3] 5.32e+01 2.00 2.87e-12 ... [output truncated] Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 51 / 60
  • 52. Working with R data objects Working with lists, data frames, and functions Understanding data frame and lists summod3$coef coef exp(coef) se(coef) z Pr(|z|) spontaneous 1.985876 7.285423 0.3524435 5.634592 1.754734e-08 induced 1.409012 4.091909 0.3607124 3.906191 9.376245e-05 summod3$coef[1, ] coef exp(coef) se(coef) z Pr(|z|) 1.985876e+00 7.285423e+00 3.524435e-01 5.634592e+00 1.754734e-08 summod3$coef[ ,2] spontaneous induced 7.285423 4.091909 summod3$coef[1,2] [1] 7.285423 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 52 / 60
  • 53. Working with R data objects Working with lists, data frames, and functions Understanding functions Risk Ratio confidence interval from baby Rothman, p. 135 rr.wald - function(x, conf.level = 0.95){ ## prepare input x1 - x[1,1]; n1 - sum(x[1,]) x0 - x[2,1]; n0 - sum(x[2,]) ## do calculations p1 - x1/n1 ##risk among exposed p0 - x0/n0 ##risk among unexposed RR - p1/p0; logRR - log(RR) SElogRR - sqrt(1/x1 - 1/n1 + 1/x0 - 1/n0) Z - qnorm(0.5*(1 + conf.level)) LCL - exp(logRR - Z*SElogRR) UCL - exp(logRR + Z*SElogRR) ##collect output list(x = x, risks = c(p1 = p1, p0 = p0), risk.ratio = RR, conf.int = c(LCL, UCL), conf.level = conf.level) } Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 53 / 60
  • 54. Working with R data objects Working with lists, data frames, and functions Understanding functions Run rr.wald function on UGDP RCT data (results displayed in 2 columns). tab Treatment Status Tolbutamide Placebo Death 30 21 Survivor 174 184 rr.wald(tab) $x Treatment Status Tolbutamide Placebo Death 30 21 Survivor 174 184 $risks p1 p0 0.5882353 0.4860335 $risk.ratio [1] 1.210277 $conf.int [1] 0.9396227 1.5588927 $conf.level [1] 0.95 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 54 / 60
  • 55. Working with R data objects Working with lists, data frames, and functions The epitools package The following epidemiologists, directly or indirectly, contributed to ’epitools’: Tom´as Arag´on, MD, DrPH, , UC Berkeley Michael P. Fay, PhD, Mathematical Statistician National Institute of Allergy and Infectious Diseases Wayne Enanoria, PhD, MPH, UC Berkeley Travis Porco, PhD, MPH, UC San Francisco Michael Samuel, DrPH, California Department of Public Health Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 55 / 60
  • 56. Working with R data objects Working with lists, data frames, and functions Using epitools for outbreak investigations Using the epitab function (only arguments are displayed); epitab(x, y = NULL, method = c(oddsratio, riskratio, rateratio), conf.level = 0.95, rev = c(neither, rows, columns, both), oddsratio = c(wald, fisher, midp, small), riskratio = c(wald, boot, small), rateratio = c(wald, midp), pvalue = c(fisher.exact, midp.exact, chi2), correction = FALSE, verbose = FALSE) Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 56 / 60
  • 57. Working with R data objects Working with lists, data frames, and functions Hypothesis testing using Oswego: Passing 2 vectors library(epitools) #load ’epitools’ package data(oswego) #load Oswego dataset attach(oswego) #attach dataset round(epitab(jello, ill, method = riskratio)$tab, 2) Outcome Predictor N p0 Y p1 riskratio lower upper p.value N 22 0.42 30 0.58 1.00 NA NA NA Y 7 0.30 16 0.70 1.21 0.84 1.72 0.44 round(epitab(jello, ill, method = oddsratio)$tab, 2) Outcome Predictor N p0 Y p1 oddsratio lower upper p.value N 22 0.76 30 0.65 1.00 NA NA NA Y 7 0.24 16 0.35 1.68 0.59 4.76 0.44 detach(oswego) #detach dataset Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 57 / 60
  • 58. Working with R data objects Working with lists, data frames, and functions Hypothesis testing using Oswego: Passing a table jello.tab1 ill jello N Y N 22 30 Y 7 16 round(epitab(jello.tab1)$tab, 2) ill jello N p0 Y p1 oddsratio lower upper p.value N 22 0.76 30 0.65 1.00 NA NA NA Y 7 0.24 16 0.35 1.68 0.59 4.76 0.44 round(epitab(jello.tab1, method = risk)$tab, 2) ill jello N p0 Y p1 riskratio lower upper p.value N 22 0.42 30 0.58 1.00 NA NA NA Y 7 0.30 16 0.70 1.21 0.84 1.72 0.44 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 58 / 60
  • 59. Working with R data objects Working with lists, data frames, and functions Hypothesis testing using Oswego: Passing one vector round(epitab(c(22, 30, 7, 16))$tab, 2) Outcome Predictor Disease1 p0 Disease2 p1 oddsratio lower upper p.value Exposed1 22 0.76 30 0.65 1.00 NA NA NA Exposed2 7 0.24 16 0.35 1.68 0.59 4.76 0.44 round(epitab(c(22, 30, 7, 16), method = risk)$tab, 2) Outcome Predictor Disease1 p0 Disease2 p1 riskratio lower upper p.value Exposed1 22 0.42 30 0.58 1.00 NA NA NA Exposed2 7 0.30 16 0.70 1.21 0.84 1.72 0.44 Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 59 / 60
  • 60. Working with R data objects Working with lists, data frames, and functions Summary 1 Background Cost Quality Community 2 Getting started with R Full-function calculator/spreadsheet Extensible statistical packages High quality graphics tool Multi-use programming language 3 Working with R data objects Atomic vs. recursive data objects Working with vectors, matrices, arrays Working with lists, data frames, and functions Tom´as Arag´on, MD, DrPH (medepi.com) Understanding R for Epidemiologists September 8, 2014 60 / 60