SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Examining data and importing data in R
Richard L. Zijdeman
May 29, 2015
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
1 Recap
2 Getting data in R
3 Do it yourself!
4 Plotting using ggplot2
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Recap
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
The structure of objects
Store just about anything in R: numbers, sentences, datasets
Objects
Study the structure of objects: str()
type of object
features of object
ships <- data.frame(year = c(1850, 1860, 1870, 1880),
inbound = c(215, 237, 237, NA),
outbound = c(212, 239, 260, 265))
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Study the structure of object “ships”"
str(ships)
## 'data.frame': 4 obs. of 3 variables:
## $ year : num 1850 1860 1870 1880
## $ inbound : num 215 237 237 NA
## $ outbound: num 212 239 260 265
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Characteristics of objects
Class: class()
Length: length()
Dimensions: dim()
class(ships)
## [1] "data.frame"
length(ships)
## [1] 3
dim(ships) # rows, columns
## [1] 4 3
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Closer inspection of data.frames
names of columns (variables): names()
top/bottom rows: head(), tail()
missing data: is.na()
names(ships)
## [1] "year" "inbound" "outbound"
is.na(ships)
## year inbound outbound
## [1,] FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE
## [4,] FALSE TRUE FALSE
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Summarizing data in data.frames
descriptive statistics: summary()
calculus: e.g. min(), mean(), sum()
results table format: table()
summary(ships)
## year inbound outbound
## Min. :1850 Min. :215.0 Min. :212.0
## 1st Qu.:1858 1st Qu.:226.0 1st Qu.:232.2
## Median :1865 Median :237.0 Median :249.5
## Mean :1865 Mean :229.7 Mean :244.0
## 3rd Qu.:1872 3rd Qu.:237.0 3rd Qu.:261.2
## Max. :1880 Max. :237.0 Max. :265.0
## NA's :1
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
is.na(ships)
## year inbound outbound
## [1,] FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE
## [4,] FALSE TRUE FALSE
table(is.na(ships))
##
## FALSE TRUE
## 11 1
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Visualizing your data
Not just for analyses!
Data quality
representativeness
missing data
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
plot(ships)
year
215 220 225 230 235
1850186018701880
215220225230235
inbound
1850 1855 1860 1865 1870 1875 1880 210 220 230 240 250 260
210220230240250260
outbound
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Getting data in R
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Data already in R
The “datasets” package
very slim datasets
specific example data
To obtain list of datasets, type:
library(help = "datasets")
To obtain information on a specific dataset, type:
help(swiss) # thus: help(name_of_package)
or to just see the data:
help(swiss)
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Reading in data
Different functions for different files:
Base R: read.table() (read.csv())
foreign package: read.spss(), read.dta(), read.dbf()
openxlsx package: read.xlsx()
alternatives packages:
xlsx(Java required)
gdata (perl-based)
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
read.xlsx() from openxlsx package
file: your file, including directory
sheet: name of sheet
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
read.csv()
file: your file, including directory
header: variable names or not?
sep: seperator
read.csv default: “,”
read.csv2 default: “;”
skip: number of rows to skip
nrows: total number of rows to read
stringsAsFactors
encoding (e.g. “latin1” or “UTF-8”)
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Do it yourself!
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Read in the following files as data.frames:
HSN_basic.xlsx
check the data.frame: using dim(), length()
check the variables: using summary(), min(), table()
Repeat for HSN_marriages.csv:
read in only 100 lines
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Plotting using ggplot2
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
ggplot2
Package by Hadley Wickham
Generic plotting for a great range of plots
ggplot2 website: http://ggplot2.org
excellent tutorial:
https://jofrhwld.github.io/avml2012/#Section_1.1
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Building your graph
Each plot consists of multiple layers
Think of a canvas on which you ‘paint’
data layer
geometries layer
statistics layer
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Data layer
data.frame and aesthetics
ggplot(data.frame, aes(x= ..., y = ...))
geometries layer
ggplot(..., aes(x= ..., y = ...)) +
geom_...() # e.g. geom_line
statistics layer
ggplot(..., aes(x= ..., y = ...)) +
geom_...() +
stat_...() # e.g. stat_smooth
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
an example
Reading in the data
hmar <- read.csv("./../data/derived/HSN_marriages.csv",
stringsAsFactors = FALSE,
encoding = "latin1",
header = TRUE,
nrows = 100)
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Plotting the data
install.packages(ggplot2)
library(ggplot2)
ggplot(hmar, aes(x= M_year, y = Age_bride)) +
geom_point()
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
M_year
Age_bride
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Improving the plot
Specify characteristics of the geom_layer
ggplot(hmar, aes(x= M_year, y = Age_bride)) +
geom_point(colour = "blue", size = 3, shape = 18)
See http:
//www.cookbook-r.com/Graphs/Shapes_and_line_types/
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Specify characteristics of the geom_layer
20
30
40
50
1830 1840 1850 1860 1870
M_year
Age_bride
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
A PTE example
Does age at marriage depend on educational attainment?
To marry you need resources
the more attainment the longer it takes to acquire resources
ergo: brides with edu attainment marry later in life
Not a statistical test: but let’s graph this
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
A request from yesterday
Can I plot labels?
ggplot(hmar, aes(x= M_year, y = Age_bride,
label = SIgn_bride)) +
geom_text()
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Yes you can!
Not really useful though. . .
h
a
h
h
h
a
h
a
h
a
a
a
a
h
a
a
h
h
h
h
h
h
h
a
a
h
h
a
a
h
a
a
a
hh
h hh
a
a
a
a
h
a
h
a
h
h
a
a
h
hh
h
a
h
h h
h
h
h
h
a
h
a
h
h
a
h
a
h
h
a
hh
a
h
h
h
h
h
h
a
a
h
h
h
h
h
h
h
h
h
a
h
a
a
h
a
h
20
30
40
50
1830 1840 1850 1860 1870
M_year
Age_bride
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Let’s try with colours. . .
ggplot(hmar, aes(x= M_year, y = Age_bride)) +
geom_point(aes(colour = factor(SIgn_bride)),
size = 3, shape = 18)
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
M_year
Age_bride
factor(SIgn_bride)
a
h
No real
pattern, though. . .
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Finalizing the graph
ggplot(hmar, aes(x= M_year, y = Age_bride)) +
geom_point(aes(colour = factor(SIgn_bride)),
size = 3,
shape = 18) +
labs(list(title = "Age of marriage over time",
x = "time (years since A.D.)",
y = "age of bride (years)",
colour = "Signature"))
# here we use colour since legend shows colour
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
time (years since A.D.)
ageofbride(years)
Signature
a
h
Age of marriage over time
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Satisfied?
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Actually not. . . the points are plotted on top of each other. . .
Solution: geom_jitter
ggplot(hmar, aes(x= M_year, y = Age_bride)) +
geom_jitter(aes(colour = factor(SIgn_bride)),
size = 3,
shape = 18) +
labs(list(title = "Age of marriage over time",
x = "time (years since A.D.)",
y = "age of bride (years)",
colour = "Signature"))
# here we use colour since legend shows colour
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
time (years since A.D.)
ageofbride(years)
Signature
a
h
Age of marriage over time
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Final remarks on ggplot2
We have just scratched the surface of ggplot2
Build your graph slowly
start with the basics
add complexity step-wise
Now it’s your turn!
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
A small PTE project
Look at the variables in the HSN files
Think of a research question
Provide a general mechanism and hypothesis
Plot your results
Richard L. Zijdeman Examining data and importing data in R

Contenu connexe

Tendances

A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environmentizahn
 
Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013
Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013
Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013Big Data Spain
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)Ankit Rathi
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformSyracuse University
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
The History and Use of R
The History and Use of RThe History and Use of R
The History and Use of RAnalyticsWeek
 
Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsNelson Auner
 
1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in rSimple Research
 
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...Edureka!
 
Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basicsC. Tobin Magle
 

Tendances (20)

A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
R tutorial
R tutorialR tutorial
R tutorial
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013
Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013
Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013
 
Working with text data
Working with text dataWorking with text data
Working with text data
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
R program
R programR program
R program
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
The History and Use of R
The History and Use of RThe History and Use of R
The History and Use of R
 
Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated Documents
 
15 unionfind
15 unionfind15 unionfind
15 unionfind
 
BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7
 
1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r
 
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basics
 

Similaire à Introduction into R for historians (part 3: examine and import data)

R visualization: ggplot2, googlevis, plotly, igraph Overview
R visualization: ggplot2, googlevis, plotly, igraph OverviewR visualization: ggplot2, googlevis, plotly, igraph Overview
R visualization: ggplot2, googlevis, plotly, igraph OverviewOlga Scrivner
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
Data Science, what even?!
Data Science, what even?!Data Science, what even?!
Data Science, what even?!David Coallier
 
introtorandrstudio.ppt
introtorandrstudio.pptintrotorandrstudio.ppt
introtorandrstudio.pptMalkaParveen3
 
Data Science, what even...
Data Science, what even...Data Science, what even...
Data Science, what even...David Coallier
 
Exploratory Analysis Part1 Coursera DataScience Specialisation
Exploratory Analysis Part1 Coursera DataScience SpecialisationExploratory Analysis Part1 Coursera DataScience Specialisation
Exploratory Analysis Part1 Coursera DataScience SpecialisationWesley Goi
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query OptimizationJ Singh
 
Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...SWAROOP KUMAR K
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data scienceLong Nguyen
 
Lecture 4 - Comm Lab: Web @ ITP
Lecture 4 - Comm Lab: Web @ ITPLecture 4 - Comm Lab: Web @ ITP
Lecture 4 - Comm Lab: Web @ ITPyucefmerhi
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
 
Metadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-FindingMetadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-FindingDATAVERSITY
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2goMoriyoshi Koizumi
 
Presentation about the use of R3BRoot for data analysis
Presentation about the use of R3BRoot for data analysisPresentation about the use of R3BRoot for data analysis
Presentation about the use of R3BRoot for data analysisJoseLuisRodriguezSan16
 
Neo4j GraphTalks Munich - Graph-based Metadata Managament & Data Governance
Neo4j GraphTalks Munich - Graph-based Metadata Managament & Data GovernanceNeo4j GraphTalks Munich - Graph-based Metadata Managament & Data Governance
Neo4j GraphTalks Munich - Graph-based Metadata Managament & Data GovernanceNeo4j
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSonaCharles2
 

Similaire à Introduction into R for historians (part 3: examine and import data) (20)

R visualization: ggplot2, googlevis, plotly, igraph Overview
R visualization: ggplot2, googlevis, plotly, igraph OverviewR visualization: ggplot2, googlevis, plotly, igraph Overview
R visualization: ggplot2, googlevis, plotly, igraph Overview
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
Data Science, what even?!
Data Science, what even?!Data Science, what even?!
Data Science, what even?!
 
introtorandrstudio.ppt
introtorandrstudio.pptintrotorandrstudio.ppt
introtorandrstudio.ppt
 
Data Science, what even...
Data Science, what even...Data Science, what even...
Data Science, what even...
 
Exploratory Analysis Part1 Coursera DataScience Specialisation
Exploratory Analysis Part1 Coursera DataScience SpecialisationExploratory Analysis Part1 Coursera DataScience Specialisation
Exploratory Analysis Part1 Coursera DataScience Specialisation
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
 
Lecture_R.ppt
Lecture_R.pptLecture_R.ppt
Lecture_R.ppt
 
Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
Lecture 4 - Comm Lab: Web @ ITP
Lecture 4 - Comm Lab: Web @ ITPLecture 4 - Comm Lab: Web @ ITP
Lecture 4 - Comm Lab: Web @ ITP
 
R basics
R basicsR basics
R basics
 
R meetup talk
R meetup talkR meetup talk
R meetup talk
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
Metadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-FindingMetadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-Finding
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
 
Presentation about the use of R3BRoot for data analysis
Presentation about the use of R3BRoot for data analysisPresentation about the use of R3BRoot for data analysis
Presentation about the use of R3BRoot for data analysis
 
Neo4j GraphTalks Munich - Graph-based Metadata Managament & Data Governance
Neo4j GraphTalks Munich - Graph-based Metadata Managament & Data GovernanceNeo4j GraphTalks Munich - Graph-based Metadata Managament & Data Governance
Neo4j GraphTalks Munich - Graph-based Metadata Managament & Data Governance
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
 

Plus de Richard Zijdeman

Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Richard Zijdeman
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Richard Zijdeman
 
grlc. store, share and run sparql queries
grlc. store, share and run sparql queriesgrlc. store, share and run sparql queries
grlc. store, share and run sparql queriesRichard Zijdeman
 
Rijpma's Catasto meets SPARQL dhb2017_workshop
Rijpma's Catasto meets SPARQL dhb2017_workshopRijpma's Catasto meets SPARQL dhb2017_workshop
Rijpma's Catasto meets SPARQL dhb2017_workshopRichard Zijdeman
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyRichard Zijdeman
 
Historical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemesHistorical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemesRichard Zijdeman
 
Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010Richard Zijdeman
 
Advancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open DataAdvancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open DataRichard Zijdeman
 
work in a globalized world
work in a globalized worldwork in a globalized world
work in a globalized worldRichard Zijdeman
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019Richard Zijdeman
 
Examples of digital history at the IISH
Examples of digital history at the IISHExamples of digital history at the IISH
Examples of digital history at the IISHRichard Zijdeman
 
Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)Richard Zijdeman
 
Using HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupationsUsing HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupationsRichard Zijdeman
 

Plus de Richard Zijdeman (15)

Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
 
grlc. store, share and run sparql queries
grlc. store, share and run sparql queriesgrlc. store, share and run sparql queries
grlc. store, share and run sparql queries
 
Rijpma's Catasto meets SPARQL dhb2017_workshop
Rijpma's Catasto meets SPARQL dhb2017_workshopRijpma's Catasto meets SPARQL dhb2017_workshop
Rijpma's Catasto meets SPARQL dhb2017_workshop
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
 
Toogdag 2017
Toogdag 2017Toogdag 2017
Toogdag 2017
 
Historical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemesHistorical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemes
 
Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010
 
Advancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open DataAdvancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open Data
 
work in a globalized world
work in a globalized worldwork in a globalized world
work in a globalized world
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019
 
Examples of digital history at the IISH
Examples of digital history at the IISHExamples of digital history at the IISH
Examples of digital history at the IISH
 
Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)
 
Using HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupationsUsing HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupations
 
Csdh sbg clariah_intr01
Csdh sbg clariah_intr01Csdh sbg clariah_intr01
Csdh sbg clariah_intr01
 

Dernier

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 

Dernier (20)

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 

Introduction into R for historians (part 3: examine and import data)

  • 1. Recap Getting data in R Do it yourself! Plotting using ggplot2 Examining data and importing data in R Richard L. Zijdeman May 29, 2015 Richard L. Zijdeman Examining data and importing data in R
  • 2. Recap Getting data in R Do it yourself! Plotting using ggplot2 1 Recap 2 Getting data in R 3 Do it yourself! 4 Plotting using ggplot2 Richard L. Zijdeman Examining data and importing data in R
  • 3. Recap Getting data in R Do it yourself! Plotting using ggplot2 Recap Richard L. Zijdeman Examining data and importing data in R
  • 4. Recap Getting data in R Do it yourself! Plotting using ggplot2 The structure of objects Store just about anything in R: numbers, sentences, datasets Objects Study the structure of objects: str() type of object features of object ships <- data.frame(year = c(1850, 1860, 1870, 1880), inbound = c(215, 237, 237, NA), outbound = c(212, 239, 260, 265)) Richard L. Zijdeman Examining data and importing data in R
  • 5. Recap Getting data in R Do it yourself! Plotting using ggplot2 Study the structure of object “ships”" str(ships) ## 'data.frame': 4 obs. of 3 variables: ## $ year : num 1850 1860 1870 1880 ## $ inbound : num 215 237 237 NA ## $ outbound: num 212 239 260 265 Richard L. Zijdeman Examining data and importing data in R
  • 6. Recap Getting data in R Do it yourself! Plotting using ggplot2 Characteristics of objects Class: class() Length: length() Dimensions: dim() class(ships) ## [1] "data.frame" length(ships) ## [1] 3 dim(ships) # rows, columns ## [1] 4 3 Richard L. Zijdeman Examining data and importing data in R
  • 7. Recap Getting data in R Do it yourself! Plotting using ggplot2 Closer inspection of data.frames names of columns (variables): names() top/bottom rows: head(), tail() missing data: is.na() names(ships) ## [1] "year" "inbound" "outbound" is.na(ships) ## year inbound outbound ## [1,] FALSE FALSE FALSE ## [2,] FALSE FALSE FALSE ## [3,] FALSE FALSE FALSE ## [4,] FALSE TRUE FALSE Richard L. Zijdeman Examining data and importing data in R
  • 8. Recap Getting data in R Do it yourself! Plotting using ggplot2 Summarizing data in data.frames descriptive statistics: summary() calculus: e.g. min(), mean(), sum() results table format: table() summary(ships) ## year inbound outbound ## Min. :1850 Min. :215.0 Min. :212.0 ## 1st Qu.:1858 1st Qu.:226.0 1st Qu.:232.2 ## Median :1865 Median :237.0 Median :249.5 ## Mean :1865 Mean :229.7 Mean :244.0 ## 3rd Qu.:1872 3rd Qu.:237.0 3rd Qu.:261.2 ## Max. :1880 Max. :237.0 Max. :265.0 ## NA's :1 Richard L. Zijdeman Examining data and importing data in R
  • 9. Recap Getting data in R Do it yourself! Plotting using ggplot2 is.na(ships) ## year inbound outbound ## [1,] FALSE FALSE FALSE ## [2,] FALSE FALSE FALSE ## [3,] FALSE FALSE FALSE ## [4,] FALSE TRUE FALSE table(is.na(ships)) ## ## FALSE TRUE ## 11 1 Richard L. Zijdeman Examining data and importing data in R
  • 10. Recap Getting data in R Do it yourself! Plotting using ggplot2 Visualizing your data Not just for analyses! Data quality representativeness missing data Richard L. Zijdeman Examining data and importing data in R
  • 11. Recap Getting data in R Do it yourself! Plotting using ggplot2 plot(ships) year 215 220 225 230 235 1850186018701880 215220225230235 inbound 1850 1855 1860 1865 1870 1875 1880 210 220 230 240 250 260 210220230240250260 outbound Richard L. Zijdeman Examining data and importing data in R
  • 12. Recap Getting data in R Do it yourself! Plotting using ggplot2 Getting data in R Richard L. Zijdeman Examining data and importing data in R
  • 13. Recap Getting data in R Do it yourself! Plotting using ggplot2 Data already in R The “datasets” package very slim datasets specific example data To obtain list of datasets, type: library(help = "datasets") To obtain information on a specific dataset, type: help(swiss) # thus: help(name_of_package) or to just see the data: help(swiss) Richard L. Zijdeman Examining data and importing data in R
  • 14. Recap Getting data in R Do it yourself! Plotting using ggplot2 Reading in data Different functions for different files: Base R: read.table() (read.csv()) foreign package: read.spss(), read.dta(), read.dbf() openxlsx package: read.xlsx() alternatives packages: xlsx(Java required) gdata (perl-based) Richard L. Zijdeman Examining data and importing data in R
  • 15. Recap Getting data in R Do it yourself! Plotting using ggplot2 read.xlsx() from openxlsx package file: your file, including directory sheet: name of sheet Richard L. Zijdeman Examining data and importing data in R
  • 16. Recap Getting data in R Do it yourself! Plotting using ggplot2 read.csv() file: your file, including directory header: variable names or not? sep: seperator read.csv default: “,” read.csv2 default: “;” skip: number of rows to skip nrows: total number of rows to read stringsAsFactors encoding (e.g. “latin1” or “UTF-8”) Richard L. Zijdeman Examining data and importing data in R
  • 17. Recap Getting data in R Do it yourself! Plotting using ggplot2 Do it yourself! Richard L. Zijdeman Examining data and importing data in R
  • 18. Recap Getting data in R Do it yourself! Plotting using ggplot2 Read in the following files as data.frames: HSN_basic.xlsx check the data.frame: using dim(), length() check the variables: using summary(), min(), table() Repeat for HSN_marriages.csv: read in only 100 lines Richard L. Zijdeman Examining data and importing data in R
  • 19. Recap Getting data in R Do it yourself! Plotting using ggplot2 Plotting using ggplot2 Richard L. Zijdeman Examining data and importing data in R
  • 20. Recap Getting data in R Do it yourself! Plotting using ggplot2 ggplot2 Package by Hadley Wickham Generic plotting for a great range of plots ggplot2 website: http://ggplot2.org excellent tutorial: https://jofrhwld.github.io/avml2012/#Section_1.1 Richard L. Zijdeman Examining data and importing data in R
  • 21. Recap Getting data in R Do it yourself! Plotting using ggplot2 Building your graph Each plot consists of multiple layers Think of a canvas on which you ‘paint’ data layer geometries layer statistics layer Richard L. Zijdeman Examining data and importing data in R
  • 22. Recap Getting data in R Do it yourself! Plotting using ggplot2 Data layer data.frame and aesthetics ggplot(data.frame, aes(x= ..., y = ...)) geometries layer ggplot(..., aes(x= ..., y = ...)) + geom_...() # e.g. geom_line statistics layer ggplot(..., aes(x= ..., y = ...)) + geom_...() + stat_...() # e.g. stat_smooth Richard L. Zijdeman Examining data and importing data in R
  • 23. Recap Getting data in R Do it yourself! Plotting using ggplot2 an example Reading in the data hmar <- read.csv("./../data/derived/HSN_marriages.csv", stringsAsFactors = FALSE, encoding = "latin1", header = TRUE, nrows = 100) Richard L. Zijdeman Examining data and importing data in R
  • 24. Recap Getting data in R Do it yourself! Plotting using ggplot2 Plotting the data install.packages(ggplot2) library(ggplot2) ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point() Richard L. Zijdeman Examining data and importing data in R
  • 25. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
  • 26. Recap Getting data in R Do it yourself! Plotting using ggplot2 Improving the plot Specify characteristics of the geom_layer ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(colour = "blue", size = 3, shape = 18) See http: //www.cookbook-r.com/Graphs/Shapes_and_line_types/ Richard L. Zijdeman Examining data and importing data in R
  • 27. Recap Getting data in R Do it yourself! Plotting using ggplot2 Specify characteristics of the geom_layer 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
  • 28. Recap Getting data in R Do it yourself! Plotting using ggplot2 A PTE example Does age at marriage depend on educational attainment? To marry you need resources the more attainment the longer it takes to acquire resources ergo: brides with edu attainment marry later in life Not a statistical test: but let’s graph this Richard L. Zijdeman Examining data and importing data in R
  • 29. Recap Getting data in R Do it yourself! Plotting using ggplot2 A request from yesterday Can I plot labels? ggplot(hmar, aes(x= M_year, y = Age_bride, label = SIgn_bride)) + geom_text() Richard L. Zijdeman Examining data and importing data in R
  • 30. Recap Getting data in R Do it yourself! Plotting using ggplot2 Yes you can! Not really useful though. . . h a h h h a h a h a a a a h a a h h h h h h h a a h h a a h a a a hh h hh a a a a h a h a h h a a h hh h a h h h h h h h a h a h h a h a h h a hh a h h h h h h a a h h h h h h h h h a h a a h a h 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
  • 31. Recap Getting data in R Do it yourself! Plotting using ggplot2 Let’s try with colours. . . ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) Richard L. Zijdeman Examining data and importing data in R
  • 32. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride factor(SIgn_bride) a h No real pattern, though. . . Richard L. Zijdeman Examining data and importing data in R
  • 33. Recap Getting data in R Do it yourself! Plotting using ggplot2 Finalizing the graph ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) + labs(list(title = "Age of marriage over time", x = "time (years since A.D.)", y = "age of bride (years)", colour = "Signature")) # here we use colour since legend shows colour Richard L. Zijdeman Examining data and importing data in R
  • 34. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 time (years since A.D.) ageofbride(years) Signature a h Age of marriage over time Richard L. Zijdeman Examining data and importing data in R
  • 35. Recap Getting data in R Do it yourself! Plotting using ggplot2 Satisfied? Richard L. Zijdeman Examining data and importing data in R
  • 36. Recap Getting data in R Do it yourself! Plotting using ggplot2 Actually not. . . the points are plotted on top of each other. . . Solution: geom_jitter ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_jitter(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) + labs(list(title = "Age of marriage over time", x = "time (years since A.D.)", y = "age of bride (years)", colour = "Signature")) # here we use colour since legend shows colour Richard L. Zijdeman Examining data and importing data in R
  • 37. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 time (years since A.D.) ageofbride(years) Signature a h Age of marriage over time Richard L. Zijdeman Examining data and importing data in R
  • 38. Recap Getting data in R Do it yourself! Plotting using ggplot2 Final remarks on ggplot2 We have just scratched the surface of ggplot2 Build your graph slowly start with the basics add complexity step-wise Now it’s your turn! Richard L. Zijdeman Examining data and importing data in R
  • 39. Recap Getting data in R Do it yourself! Plotting using ggplot2 A small PTE project Look at the variables in the HSN files Think of a research question Provide a general mechanism and hypothesis Plot your results Richard L. Zijdeman Examining data and importing data in R