SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Stat405
     Polishing graphics for presentation



                            Hadley Wickham
Thursday, 21 October 2010
Mark Schoenhals
                    • Rice alum & former stat major
                    • Visiting on Monday. Two talks: 11am (Four
                      designs for ecommerce experiments),
                      4pm (Using data smarter faster: How one
                      small store sold $500 million of music
                      gear to over 1 million customers)
                    • Undergrads are invited to lunch with him.
                      Email me if you’re interested


Thursday, 21 October 2010
# Who is the most accurate shooter in the NBA?

     library(plyr)

     nba <- read.csv("nba-0809.csv.bz2")
     shots <- subset(nba, etype == "shot")

     success <- ddply(shots, c("team", "player"),
     summarise,
       total = length(player),
       made = sum(result == "made"))
     success$prop <- success$made / success$total
     success <- arrange(success, desc(prop))


Thursday, 21 October 2010
team            player total made      prop
     1       NYK        Eddy Curry     1    1 1.0000000
     2       OKC       Steven Hill     1    1 1.0000000
     3       SAS   Marcus Williams     2    2 1.0000000
     4       CHA      Dwayne Jones     4    3 0.7500000
     5       OKC     Mouhamed Sene     7    5 0.7142857
     6       SAS Pops Mensah-Bonsu     7    5 0.7142857
     7       BOS      J.R. Giddens     3    2 0.6666667
     8       LAL           Yue Sun     3    2 0.6666667
     9       MIL        Eddie Gill     9    6 0.6666667
     10      DAL     Erick Dampier   269 175 0.6505576
     11      LAC    DeAndre Jordan   132   85 0.6439394
     12      ORL      Adonal Foyle    11    7 0.6363636
     13      BOS       Bill Walker    58   36 0.6206897
     14      POR    Joel Przybilla   261 161 0.6168582
     15      POR Shavlik Randolph     13    8 0.6153846
     16      DET      Amir Johnson   152   92 0.6052632
     17      PHX Shaquille O'Neal    813 491 0.6039360
     18      DEN      Nene Hilario   708 427 0.6031073
     19      ATL     Solomon Jones    93   56 0.6021505
     20      BOS       Mikki Moore    85   51 0.6000000



Thursday, 21 October 2010
1.0       ●




        0.8
                  ●

                  ●

                  ●
                  ●
                                ●           ●
                   ●
                   ● ●                      ●
                                  ●
        0.6       ●● ● ●
                   ●
                            ●
                                               ●●
                                                                                    ●         ●
                    ●●      ● ●                                  ●                                        ● ●
                                     ●    ●●        ● ● ● ●                           ●
                  ● ●                 ●        ●     ●          ● ●                                ●
                                                                                                   ●
                    ●●●●         ●     ●●                        ●                            ●
                   ●● ●                                                          ● ●●
                                                                                    ●
                      ● ● ● ●●       ●●         ●●           ● ●        ● ● ●●
                       ●●       ●● ●       ● ●● ●          ●     ●● ●                                                 ●
 prop




                  ●
                  ●
                  ●                                         ● ●             ● ● ●● ●           ●         ●   ●        ●
                             ●●
                     ● ●● ● ●● ●            ●
                                            ● ●● ●
                                                   ●   ●         ●●●          ● ●● ● ● ● ●
                                                                               ●                          ●                ● ●                   ●       ●
                       ● ● ● ●●   ●     ●●
                                         ●          ● ● ● ● ●● ●                     ●●
                                                                                     ●                    ●           ●● ●
                                                                                                                           ●                 ●
                   ●● ● ●
                    ● ● ● ● ●●● ● ● ● ● ●● ● ●
                                         ●      ●        ●              ● ●     ●●
                                                                                 ●        ● ●     ●●
                                                                                                                ● ●                 ●   ●            ●
                              ● ● ●●●                  ● ● ● ● ● ● ● ●●●●
                                                                                             ● ●●        ●      ● ●●           ●
                          ● ●
                      ● ● ● ● ●● ● ● ●
                   ● ● ●● ● ● ● ●
                                                      ● ● ● ● ●● ●
                                            ● ● ●● ● ● ●
                                                                    ●●
                                                                     ●       ●
                                                                                 ● ● ●
                                                                                     ●●         ●
                                                                                                    ●    ● ●
                                                                                                         ● ●   ●●
                                                                                                                    ●
                                                                                                                         ●● ● ●● ● ●
                         ●●         ●
                                    ●       ●               ●     ●     ● ●● ●●
                                                                         ● ●
                                                                         ●               ●               ●● ● ● ●
                                                                                                           ●
                   ● ● ● ●● ● ●●    ●     ● ● ●           ●●           ●●              ●●                ●●
                           ●●
                          ●●● ● ● ● ● ●● ●●
                                  ● ●       ●        ●●    ●
                                                           ●
                                                           ●      ●●          ●●       ●●            ●     ● ●
        0.4       ● ●●●●
                         ●●●
                      ●● ● ● ●    ●       ●              ● ● ● ●          ● ● ●●             ● ●              ●   ●
                   ●        ● ● ●●        ●● ● ●
                                           ● ●          ●
                                                       ● ●      ●          ●
                  ● ●● ●●● ● ● ● ● ● ●
                      ●     ●●
                                             ●●●
                                                 ●
                                                              ●
                                                                 ●
                                                                           ●      ●
                                                                                                     ●
                   ● ● ●        ● ● ●
                     ● ● ● ● ● ●
                      ●●● ●          ●
                   ●● ● ●
                  ●●
                   ●                   ●
                                       ● ●
                      ● ●
                    ● ●
                    ●                   ●
                    ●
                   ● ● ●
                   ●
                  ●●      ●
                  ● ● ●
                   ● ● ●
                   ●●
                    ●●●
                  ● ●
                  ●●
        0.2       ●
                  ●
                  ●
                  ●




        0.0       ●
                  ●




                                                             500                                          1000                              1500
                                                                                      total
Thursday, 21 October 2010
1. ggplot() practice
                2. Communication graphics
                3. Polishing a plot: scales and themes




Thursday, 21 October 2010
50
                 ●
                    ● ●●
                 ● ● ●●       ● ●
                              ●● ●●
                 ● ● ● ●●             ●●
                  ● ● ●● ● ● ● ● ●   ●         ●●●
      45
                ●     ●● ●           ● ●
                                   ● ● ● ●● ●
                                    ●● ● ●
                 ●    ●●                ●
                                   ● ● ●●    ●● ●
                                 ●● ● ●●● ● ● ●●
                                               ●
                ●
                ●●● ● ● ●●       ● ●● ● ● ●●●
                                    ●
                                 ● ● ● ● ●●
                                     ●● ●● ●●
                                                                   % cancelled

                 ●         ●●        ● ● ●●   ●
                                              ●                         0.0
                                             ●●
      40                                                            ●




                  ●       ●
                          ●●      ● ●        ●
                 ●● ● ●
                  ●             ● ●    ● ●●●
                                      ● ●●●●●                       ●   0.2

                 ● ●● ●          ●●      ●●
                                       ●● ●●
                                                                   ●    0.4

      35          ●● ● ● ● ● ●●● ● ● ●●
                   ●
                   ●● ●         ●       ●● ●                       ●    0.6
                   ●●
                    ●● ●      ● ● ● ●●●● ●
                                     ●● ●
                                 ●●●● ● ●●●●                       ●    0.8
                     ●     ● ●●● ● ● ●●
                                  ●●
                               ●● ●●●●● ●●                         ●    1.0
      30                        ●●● ●●● ●
                                ●●●● ●
                                          ●
                               ●●        ●●
                                          ●
                                         ●●
                                ●
                                ●         ●●
      25
                                          ●
                            −120   −110   −100   −90   −80   −70




Thursday, 21 October 2010
Your turn
                    Identify the data and layers in the flight
                    delays data, then write the ggplot2 code
                    to create it.
                    library(ggplot2)
                    library(maps)
                    usa <- map_data("state")
                    feb13 <- read.csv("delays-feb-13-2007.csv")



Thursday, 21 October 2010
ggplot(feb13, aes(long, lat)) +
       geom_point(aes(size = 1), colour = "white") +
       geom_polygon(aes(group = group), data = usa,
         colour = "grey70", fill = NA) +
       geom_point(aes(size = ncancelw / ntot),
         colour = alpha("black", 1/2))

     # Polishing: up next
     last_plot() +
       scale_area("% cancelled", to = c(1, 8),
         breaks = seq(0, 1, by = 0.2), limits = c(0, 1))
       scale_x_continuous("", limits = c(-125, -67)),
       scale_y_continuous("", limits = c(24, 50))


Thursday, 21 October 2010
Communication graphics

                    When you need to communicate your
                    findings, you need to spend a lot of time
                    polishing your graphics to eliminate
                    distractions and focus on the story.
                    Now it’s time to pay attention to the small
                    stuff: labels, colour choices, tick marks...



Thursday, 21 October 2010
Context
Thursday, 21 October 2010
Consumption
Thursday, 21 October 2010
36
            What’s
            wrong
            with this
       34
            plot?
                                                                     bin
       32                                                                  < 1000
                                                                           < 1e4
 lat




                                                                           < 1e5
                                                                           < 1e6
       30
                                                                           < 1e7




       28




       26


                    −106    −104   −102     −100   −98   −96   −94
                                          long
Thursday, 21 October 2010
Some problems
                    Incorrect coordinate system
                    Bad colour scheme
                    Unnecessary axis labels
                    Legend needs improvement: better title
                    and better key labels
                    No title


Thursday, 21 October 2010
Thursday, 21 October 2010
1. Scales: used to override default
                        perceptual mappings, and tune
                        parameters of axes and legends.

                2. Themes: control presentation of
                        non-data elements.

                3. Saving your work: to include in
                   reports, presentations, etc.



Thursday, 21 October 2010
Scales

Thursday, 21 October 2010
Scales
                    Control how data is mapped to perceptual
                    properties, and produce guides (axes and
                    legends) which allow us to read the plot.
                    Important parameters: name, breaks &
                    labels, limits.
                    Naming scheme: scale_aesthetic_name.
                    All default scales have name continuous or
                    discrete.


Thursday, 21 October 2010
# Default scales
     scale_x_continuous()
     scale_y_discrete()
     scale_colour_discrete()

     # Custom scales
     scale_colour_hue()
     scale_x_log10()
     scale_fill_brewer()

     # Scales with parameters
     scale_x_continuous("X Label", limits = c(1, 10))
     scale_colour_gradient(low = "blue", high = "red")


Thursday, 21 October 2010
# First argument (name) controls axis label
     scale_y_continuous("Latitude")
     scale_x_continuous("")

     # Breaks and labels control tick marks
     scale_x_continuous(breaks = -c(106,100,94))
     scale_fill_discrete(labels = c("< 1000" = "< 1000",
       "< 1e4" = "< 10,000", "< 1e5" = "< 100,000",
       "< 1e6" = "< 1,000,000", "< 1e7" = "1,000,000+"))
     scale_y_continuous(breaks = NA)

     # Limits control range of data
     scale_y_continuous(limits = c(26, 32))
     # same as:
     p + ylim(26, 32)
Thursday, 21 October 2010
options(stringsAsFactors = FALSE)
     pop <- read.csv("tx-pop.csv")
     pop$bin <- cut(log10(pop$pop), breaks = 2:7,
       labels = c("< 1000", "< 1e4", "< 1e5",
       "< 1e6", "< 1e7"))
     borders <- read.csv("tx-borders.csv")
     choro <- join(borders, pop)

     qplot(long, lat, data = choro, geom =
     "polygon", group = group, fill = bin)


Thursday, 21 October 2010
Your turn


                    Fix the axis and legend related problems
                    that we have identified.




Thursday, 21 October 2010
qplot(long, lat, data = choro, geom = "polygon", group = group, fill = bin) +
       scale_fill_discrete("Population", labels =
         c("< 1000" = "< 1000" , "< 1e4" = "< 10,000", "< 1e5" = "< 100,000",
         "< 1e6" = "< 1,000,000", "< 1e7" = "1,000,000+")) +
       scale_x_continuous("") +
       scale_y_continuous("") +
       coord_map()




Thursday, 21 October 2010
Alternate scales
                    Can also override the default choice of
                    scales. You are most likely to want to
                    do this with colour, as it is the most
                    important aesthetic after position.
                    Need a little background to be able to
                    use colour effectively: colour spaces &
                    colour blindness.


Thursday, 21 October 2010
Colour spaces
                    Most familiar is rgb: defines colour as
                    mixture of red, green and blue. Matches
                    the physics of eye, but the brain does a
                    lot of post-processing, so it’s hard to
                    directly perceive these components.
                    A more useful colour space is hcl:
                    hue, chroma and luminance


Thursday, 21 October 2010
hue
                            luminance




                                        chroma
Thursday, 21 October 2010
Default colour scales

                    Discrete: evenly spaced hues of equal
                    chroma and luminance. No colour
                    appears more important than any other.
                    Does not imply order.
                    Continuous: evenly spaced hues
                    between two colours.



Thursday, 21 October 2010
Colour blindness

                    7-10% of men are red-green colour
                    “blind”. (Many other rarer types of colour
                    blindness)
                    Solutions: avoid red-green contrasts; use
                    redundant mappings; test. I like color
                    oracle: http://colororacle.cartography.ch



Thursday, 21 October 2010
Alternatives


                    Discrete: brewer, grey
                    Continuous: gradient2, gradientn




Thursday, 21 October 2010
Your turn

                    Modify the fill scale to use a Brewer
                    colour palette of your choice. (Hint: you
                    will need to change the name of the scale)
                    Use RColorBrewer::display.brewer.all
                    to list all palettes.



Thursday, 21 October 2010
Themes

Thursday, 21 October 2010
Visual appearance
                    So far have only discussed how to get the
                    data displayed the way you want,
                    focussing on the essence of the plot.
                    Themes give you a huge amount of
                    control over the appearance of the plot,
                    the choice of background colours, fonts
                    and so on.


Thursday, 21 October 2010
# Two built in themes. The default:
     qplot(carat, price, data = diamonds)

     # And a theme with a white background:
     qplot(carat, price, data = diamonds) + theme_bw()

     # Use theme_set if you want it to apply to every
     # future plot.
     theme_set(theme_bw())

     # This is the best way of seeing all the default
     # options
     theme_bw()
     theme_grey()

Thursday, 21 October 2010
Plot title

                    The plot theme also controls the plot title.
                    You can change this for an individual plot
                    by adding
                    opts(title = "My title")




Thursday, 21 October 2010
Your turn


                    Add an informative title and see what the
                    plot looks like with a white background.




Thursday, 21 October 2010
Elements
                    You can also make your own theme, or
                    modify and existing.
                    Themes are made up of elements which
                    can be one of: theme_line, theme_segment,
                    theme_text, theme_rect, theme_blank
                    Gives you a lot of control over plot
                    appearance.


Thursday, 21 October 2010
Elements
                    Axis: axis.line, axis.text.x, axis.text.y,
                    axis.ticks, axis.title.x, axis.title.y
                    Legend: legend.background, legend.key,
                    legend.text, legend.title
                    Panel: panel.background, panel.border,
                    panel.grid.major, panel.grid.minor
                    Strip: strip.background, strip.text.x,
                    strip.text.y


Thursday, 21 October 2010
# To modify a plot
     p + opts(plot.title    =
       theme_text(size =    12, face = "bold"))
     p + opts(plot.title    = theme_text(colour = "red"))
     p + opts(plot.title    = theme_text(angle = 45))
     p + opts(plot.title    = theme_text(hjust = 1))




Thursday, 21 October 2010
# If we want, we could also remove the axes:
     last_plot() + opts(
       axis.text.x = theme_blank(),
       axis.text.y = theme_blank(),
       axis.title.x = theme_blank(),
       axis.title.y = theme_blank(),
       axis.ticks.length = unit(0, "cm"),
       axis.ticks.margin = unit(0, "cm"))




Thursday, 21 October 2010
Thursday, 21 October 2010

Contenu connexe

En vedette (20)

10 simulation
10 simulation10 simulation
10 simulation
 
14 Ddply
14 Ddply14 Ddply
14 Ddply
 
02 large
02 large02 large
02 large
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
20 date-times
20 date-times20 date-times
20 date-times
 
21 spam
21 spam21 spam
21 spam
 
16 critique
16 critique16 critique
16 critique
 
19 tables
19 tables19 tables
19 tables
 
Grammar Of Graphics: past, present, future
Grammar Of Graphics: past, present, futureGrammar Of Graphics: past, present, future
Grammar Of Graphics: past, present, future
 
13 case-study
13 case-study13 case-study
13 case-study
 
15 time-space
15 time-space15 time-space
15 time-space
 
19 Critique
19 Critique19 Critique
19 Critique
 
11 Simulation
11 Simulation11 Simulation
11 Simulation
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
23 data-structures
23 data-structures23 data-structures
23 data-structures
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
02 Ddply
02 Ddply02 Ddply
02 Ddply
 
24 modelling
24 modelling24 modelling
24 modelling
 
R packages
R packagesR packages
R packages
 

Similaire à 17 polishing

Fairisle knitting
Fairisle knittingFairisle knitting
Fairisle knittingzafiro555
 
正誤表 p39
正誤表 p39正誤表 p39
正誤表 p39zafiro555
 
Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)Anparasu
 
Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)Anparasu
 
Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)Anparasu
 
Modul mulus bahagian c sk (modul guru)
Modul mulus bahagian c sk (modul guru)Modul mulus bahagian c sk (modul guru)
Modul mulus bahagian c sk (modul guru)Anparasu
 
Over Visie, Missie En Strategie
Over Visie, Missie En StrategieOver Visie, Missie En Strategie
Over Visie, Missie En StrategieGuus Vos
 
About Vision, Mission And Strategy
About Vision, Mission And StrategyAbout Vision, Mission And Strategy
About Vision, Mission And StrategyGuus Vos
 
Barley environmental association - Plant & Animal Genome 2018
Barley environmental association - Plant & Animal Genome 2018Barley environmental association - Plant & Animal Genome 2018
Barley environmental association - Plant & Animal Genome 2018PeterMorrell4
 
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big DataAllen Day, PhD
 
Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1rusersla
 
Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...
Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...
Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...Roxana Hickey
 
Fruit breedomics workshop wp6 application of high throughput micheletti
Fruit breedomics workshop wp6 application of high throughput michelettiFruit breedomics workshop wp6 application of high throughput micheletti
Fruit breedomics workshop wp6 application of high throughput michelettifruitbreedomics
 
Aiello-Lammens: Global Sensitivity Analysis for Impact Assessments.
Aiello-Lammens:  Global Sensitivity Analysis for Impact Assessments.Aiello-Lammens:  Global Sensitivity Analysis for Impact Assessments.
Aiello-Lammens: Global Sensitivity Analysis for Impact Assessments.questRCN
 

Similaire à 17 polishing (20)

Fairisle knitting
Fairisle knittingFairisle knitting
Fairisle knitting
 
正誤表 p39
正誤表 p39正誤表 p39
正誤表 p39
 
Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)
 
Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)
 
Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)
 
Modul mulus bahagian c sk (modul guru)
Modul mulus bahagian c sk (modul guru)Modul mulus bahagian c sk (modul guru)
Modul mulus bahagian c sk (modul guru)
 
Over Visie, Missie En Strategie
Over Visie, Missie En StrategieOver Visie, Missie En Strategie
Over Visie, Missie En Strategie
 
About Vision, Mission And Strategy
About Vision, Mission And StrategyAbout Vision, Mission And Strategy
About Vision, Mission And Strategy
 
Barley environmental association - Plant & Animal Genome 2018
Barley environmental association - Plant & Animal Genome 2018Barley environmental association - Plant & Animal Genome 2018
Barley environmental association - Plant & Animal Genome 2018
 
17 Sampling Dist
17 Sampling Dist17 Sampling Dist
17 Sampling Dist
 
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
 
21 Ml
21 Ml21 Ml
21 Ml
 
Tokyor16
Tokyor16Tokyor16
Tokyor16
 
Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1
 
Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...
Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...
Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...
 
Fruit breedomics workshop wp6 application of high throughput micheletti
Fruit breedomics workshop wp6 application of high throughput michelettiFruit breedomics workshop wp6 application of high throughput micheletti
Fruit breedomics workshop wp6 application of high throughput micheletti
 
Rgraphics
RgraphicsRgraphics
Rgraphics
 
02 Large
02 Large02 Large
02 Large
 
Aiello-Lammens: Global Sensitivity Analysis for Impact Assessments.
Aiello-Lammens:  Global Sensitivity Analysis for Impact Assessments.Aiello-Lammens:  Global Sensitivity Analysis for Impact Assessments.
Aiello-Lammens: Global Sensitivity Analysis for Impact Assessments.
 
13 Bivariate
13 Bivariate13 Bivariate
13 Bivariate
 

Plus de Hadley Wickham (14)

27 development
27 development27 development
27 development
 
27 development
27 development27 development
27 development
 
22 spam
22 spam22 spam
22 spam
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
08 functions
08 functions08 functions
08 functions
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 
06 data
06 data06 data
06 data
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 
04 reports
04 reports04 reports
04 reports
 
03 extensions
03 extensions03 extensions
03 extensions
 
25 fin
25 fin25 fin
25 fin
 

17 polishing

  • 1. Stat405 Polishing graphics for presentation Hadley Wickham Thursday, 21 October 2010
  • 2. Mark Schoenhals • Rice alum & former stat major • Visiting on Monday. Two talks: 11am (Four designs for ecommerce experiments), 4pm (Using data smarter faster: How one small store sold $500 million of music gear to over 1 million customers) • Undergrads are invited to lunch with him. Email me if you’re interested Thursday, 21 October 2010
  • 3. # Who is the most accurate shooter in the NBA? library(plyr) nba <- read.csv("nba-0809.csv.bz2") shots <- subset(nba, etype == "shot") success <- ddply(shots, c("team", "player"), summarise, total = length(player), made = sum(result == "made")) success$prop <- success$made / success$total success <- arrange(success, desc(prop)) Thursday, 21 October 2010
  • 4. team player total made prop 1 NYK Eddy Curry 1 1 1.0000000 2 OKC Steven Hill 1 1 1.0000000 3 SAS Marcus Williams 2 2 1.0000000 4 CHA Dwayne Jones 4 3 0.7500000 5 OKC Mouhamed Sene 7 5 0.7142857 6 SAS Pops Mensah-Bonsu 7 5 0.7142857 7 BOS J.R. Giddens 3 2 0.6666667 8 LAL Yue Sun 3 2 0.6666667 9 MIL Eddie Gill 9 6 0.6666667 10 DAL Erick Dampier 269 175 0.6505576 11 LAC DeAndre Jordan 132 85 0.6439394 12 ORL Adonal Foyle 11 7 0.6363636 13 BOS Bill Walker 58 36 0.6206897 14 POR Joel Przybilla 261 161 0.6168582 15 POR Shavlik Randolph 13 8 0.6153846 16 DET Amir Johnson 152 92 0.6052632 17 PHX Shaquille O'Neal 813 491 0.6039360 18 DEN Nene Hilario 708 427 0.6031073 19 ATL Solomon Jones 93 56 0.6021505 20 BOS Mikki Moore 85 51 0.6000000 Thursday, 21 October 2010
  • 5. 1.0 ● 0.8 ● ● ● ● ● ● ● ● ● ● ● 0.6 ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ● prop ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ●● ●● ●● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● 0.4 ● ●●●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ●● 0.2 ● ● ● ● 0.0 ● ● 500 1000 1500 total Thursday, 21 October 2010
  • 6. 1. ggplot() practice 2. Communication graphics 3. Polishing a plot: scales and themes Thursday, 21 October 2010
  • 7. 50 ● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●●● 45 ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ●●● ● ● ●● ● ● ●●● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ●● ●● ●● ●● % cancelled ● ●● ● ● ●● ● ● 0.0 ●● 40 ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●●●●● ● 0.2 ● ●● ● ●● ●● ●● ●● ● 0.4 35 ●● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ●● ● ● 0.6 ●● ●● ● ● ● ● ●●●● ● ●● ● ●●●● ● ●●●● ● 0.8 ● ● ●●● ● ● ●● ●● ●● ●●●●● ●● ● 1.0 30 ●●● ●●● ● ●●●● ● ● ●● ●● ● ●● ● ● ●● 25 ● −120 −110 −100 −90 −80 −70 Thursday, 21 October 2010
  • 8. Your turn Identify the data and layers in the flight delays data, then write the ggplot2 code to create it. library(ggplot2) library(maps) usa <- map_data("state") feb13 <- read.csv("delays-feb-13-2007.csv") Thursday, 21 October 2010
  • 9. ggplot(feb13, aes(long, lat)) + geom_point(aes(size = 1), colour = "white") + geom_polygon(aes(group = group), data = usa, colour = "grey70", fill = NA) + geom_point(aes(size = ncancelw / ntot), colour = alpha("black", 1/2)) # Polishing: up next last_plot() + scale_area("% cancelled", to = c(1, 8), breaks = seq(0, 1, by = 0.2), limits = c(0, 1)) scale_x_continuous("", limits = c(-125, -67)), scale_y_continuous("", limits = c(24, 50)) Thursday, 21 October 2010
  • 10. Communication graphics When you need to communicate your findings, you need to spend a lot of time polishing your graphics to eliminate distractions and focus on the story. Now it’s time to pay attention to the small stuff: labels, colour choices, tick marks... Thursday, 21 October 2010
  • 13. 36 What’s wrong with this 34 plot? bin 32 < 1000 < 1e4 lat < 1e5 < 1e6 30 < 1e7 28 26 −106 −104 −102 −100 −98 −96 −94 long Thursday, 21 October 2010
  • 14. Some problems Incorrect coordinate system Bad colour scheme Unnecessary axis labels Legend needs improvement: better title and better key labels No title Thursday, 21 October 2010
  • 16. 1. Scales: used to override default perceptual mappings, and tune parameters of axes and legends. 2. Themes: control presentation of non-data elements. 3. Saving your work: to include in reports, presentations, etc. Thursday, 21 October 2010
  • 18. Scales Control how data is mapped to perceptual properties, and produce guides (axes and legends) which allow us to read the plot. Important parameters: name, breaks & labels, limits. Naming scheme: scale_aesthetic_name. All default scales have name continuous or discrete. Thursday, 21 October 2010
  • 19. # Default scales scale_x_continuous() scale_y_discrete() scale_colour_discrete() # Custom scales scale_colour_hue() scale_x_log10() scale_fill_brewer() # Scales with parameters scale_x_continuous("X Label", limits = c(1, 10)) scale_colour_gradient(low = "blue", high = "red") Thursday, 21 October 2010
  • 20. # First argument (name) controls axis label scale_y_continuous("Latitude") scale_x_continuous("") # Breaks and labels control tick marks scale_x_continuous(breaks = -c(106,100,94)) scale_fill_discrete(labels = c("< 1000" = "< 1000", "< 1e4" = "< 10,000", "< 1e5" = "< 100,000", "< 1e6" = "< 1,000,000", "< 1e7" = "1,000,000+")) scale_y_continuous(breaks = NA) # Limits control range of data scale_y_continuous(limits = c(26, 32)) # same as: p + ylim(26, 32) Thursday, 21 October 2010
  • 21. options(stringsAsFactors = FALSE) pop <- read.csv("tx-pop.csv") pop$bin <- cut(log10(pop$pop), breaks = 2:7, labels = c("< 1000", "< 1e4", "< 1e5", "< 1e6", "< 1e7")) borders <- read.csv("tx-borders.csv") choro <- join(borders, pop) qplot(long, lat, data = choro, geom = "polygon", group = group, fill = bin) Thursday, 21 October 2010
  • 22. Your turn Fix the axis and legend related problems that we have identified. Thursday, 21 October 2010
  • 23. qplot(long, lat, data = choro, geom = "polygon", group = group, fill = bin) + scale_fill_discrete("Population", labels = c("< 1000" = "< 1000" , "< 1e4" = "< 10,000", "< 1e5" = "< 100,000", "< 1e6" = "< 1,000,000", "< 1e7" = "1,000,000+")) + scale_x_continuous("") + scale_y_continuous("") + coord_map() Thursday, 21 October 2010
  • 24. Alternate scales Can also override the default choice of scales. You are most likely to want to do this with colour, as it is the most important aesthetic after position. Need a little background to be able to use colour effectively: colour spaces & colour blindness. Thursday, 21 October 2010
  • 25. Colour spaces Most familiar is rgb: defines colour as mixture of red, green and blue. Matches the physics of eye, but the brain does a lot of post-processing, so it’s hard to directly perceive these components. A more useful colour space is hcl: hue, chroma and luminance Thursday, 21 October 2010
  • 26. hue luminance chroma Thursday, 21 October 2010
  • 27. Default colour scales Discrete: evenly spaced hues of equal chroma and luminance. No colour appears more important than any other. Does not imply order. Continuous: evenly spaced hues between two colours. Thursday, 21 October 2010
  • 28. Colour blindness 7-10% of men are red-green colour “blind”. (Many other rarer types of colour blindness) Solutions: avoid red-green contrasts; use redundant mappings; test. I like color oracle: http://colororacle.cartography.ch Thursday, 21 October 2010
  • 29. Alternatives Discrete: brewer, grey Continuous: gradient2, gradientn Thursday, 21 October 2010
  • 30. Your turn Modify the fill scale to use a Brewer colour palette of your choice. (Hint: you will need to change the name of the scale) Use RColorBrewer::display.brewer.all to list all palettes. Thursday, 21 October 2010
  • 32. Visual appearance So far have only discussed how to get the data displayed the way you want, focussing on the essence of the plot. Themes give you a huge amount of control over the appearance of the plot, the choice of background colours, fonts and so on. Thursday, 21 October 2010
  • 33. # Two built in themes. The default: qplot(carat, price, data = diamonds) # And a theme with a white background: qplot(carat, price, data = diamonds) + theme_bw() # Use theme_set if you want it to apply to every # future plot. theme_set(theme_bw()) # This is the best way of seeing all the default # options theme_bw() theme_grey() Thursday, 21 October 2010
  • 34. Plot title The plot theme also controls the plot title. You can change this for an individual plot by adding opts(title = "My title") Thursday, 21 October 2010
  • 35. Your turn Add an informative title and see what the plot looks like with a white background. Thursday, 21 October 2010
  • 36. Elements You can also make your own theme, or modify and existing. Themes are made up of elements which can be one of: theme_line, theme_segment, theme_text, theme_rect, theme_blank Gives you a lot of control over plot appearance. Thursday, 21 October 2010
  • 37. Elements Axis: axis.line, axis.text.x, axis.text.y, axis.ticks, axis.title.x, axis.title.y Legend: legend.background, legend.key, legend.text, legend.title Panel: panel.background, panel.border, panel.grid.major, panel.grid.minor Strip: strip.background, strip.text.x, strip.text.y Thursday, 21 October 2010
  • 38. # To modify a plot p + opts(plot.title = theme_text(size = 12, face = "bold")) p + opts(plot.title = theme_text(colour = "red")) p + opts(plot.title = theme_text(angle = 45)) p + opts(plot.title = theme_text(hjust = 1)) Thursday, 21 October 2010
  • 39. # If we want, we could also remove the axes: last_plot() + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank(), axis.title.x = theme_blank(), axis.title.y = theme_blank(), axis.ticks.length = unit(0, "cm"), axis.ticks.margin = unit(0, "cm")) Thursday, 21 October 2010