2. OOvveerrvviieeww
nyc open data portal
Rstudio
R
Github
hack time
·
·
·
·
·
2/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
2 of 27 6/13/14, 1:50 PM
13. With() is generic function that evaluates expr in
a local environment constructed from data.
Using ggplot2, "aes" stands for "aesthetics",
"geom"" is used to create scatterplots
pplloottddiiaammoonnddss
with(diamonds, plot(carat, price)) ggplot(diamonds, aes(x = carat, y = price)) + geom_point()
13/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
13 of 27 6/13/14, 1:50 PM
14. pplloottddiiaammoonnddss
ggplot2 generates more supplicated graph than the traditional graphics package. Let us play with
some color
ggplot(diamonds, aes(x = carat, y = price, colour = cut)) + geom_point()
14/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
14 of 27 6/13/14, 1:50 PM
15. pplloottddiiaammoonnddss
In stead of fitting linear relation, we try to fit log linear relation
Log(price) is quite linear with log(carat),Bingo!
ggplot(diamonds, aes(x = log(carat), y = log(price), colour = cut)) + geom_point()
15/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
15 of 27 6/13/14, 1:50 PM
16. pplloottddiiaammoonnddss
As letters go from D to J, the diamond becomes more and more yellow. The numbers beside
"S"(small) and "VS"(very small) describe the size of "internal imperfections" in the diamonds. "IF" is
internally flawless.
ggplot(diamonds, aes(x = log(carat), y = log(price), colour = cut)) + geom_point() +
facet_grid(clarity ~ color)
16/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
16 of 27 6/13/14, 1:50 PM
17. pplloottddiiaammoonnddss
Let us look back to a normal scale. The bottom left panel shows price vs carat for ultimate white and
internally flawless diamonds. The upper right panel shows price vs carat for most unpure(or dirtiest)
and flawed diamonds.
ggplot(diamonds, aes(x = carat, y = price, colour = cut)) + geom_point() + facet_grid(clarity ~
color)
17/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
17 of 27 6/13/14, 1:50 PM
18. pplloottddiiaammoonnddss
As we would expect, for the diamonds at the same level of pureness(observed by row) , the price
per carat increases faster for white stone (bottom left) than for yellow stone(bottom right). And for the
diamond at the same level of color (observed by column), the price per carat increases faster for
pure stone(bottom left) than for dirty stone(upper left).
18/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
18 of 27 6/13/14, 1:50 PM
19. pplloottddiiaammoonnddss
We facet the plot by one of these factor variables--clarity.
ggplot(diamonds, aes(x = carat, y = price, colour = cut)) + geom_point() + facet_grid(clarity ~
.)
19/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
19 of 27 6/13/14, 1:50 PM
20. ggooooddttiippttooggeenneerraatteepplloottss
The same type of graph is used over and over again while new individual component of ggplot2 is
introduced and interpreted. It is a very effective way to display complex relationship in large,
high-dimensional data. Remember, the key is to bring in only one change each time.
Source: http://gettinggeneticsdone.blogspot.com/2010/01/ggplot2-tutorial-scatterplots-in-series.html
20/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
20 of 27 6/13/14, 1:50 PM
21. pplloottddiiaammoonnddss
Last , we fit line for the orginal data and for the log transformed data.The linear relation is roughly
perfect of the log transformed data if we ignore the few points at two sides of the distribution.
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth()ggplot(diamonds, aes(x = log(carat), y = log(price))) + geo
21/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
21 of 27 6/13/14, 1:50 PM
23. wwhhyyddoowweeuusseeRR
Dirk's exmaple about elegance and efficiency of R Source: Dirk Eddelbuettel
23/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
23 of 27 6/13/14, 1:50 PM
24. wwhhyyddoowweeuusseeRR
Dirk's exmaple about elegance and efficiency of R Source: Dirk Eddelbuettel
24/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
24 of 27 6/13/14, 1:50 PM
25. hhaacckkttiimmee
download an open dataset using filter
read it in to your Rstudio
check the dimensity of the dataset
decide which columns you will use
plot it!
·
·
·
·
·
25/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
25 of 27 6/13/14, 1:50 PM
26. RReessoouurrcceess
R in a Nutshell - Joseph Adler
The Art of R Programming - Norman Matloff
ggplot2 - Elegant Graphics for Data Analysis - Hadley Wickham
26/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
26 of 27 6/13/14, 1:50 PM
27. 27/27
R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1
27 of 27 6/13/14, 1:50 PM