11. Wouldn’t it be amazing if we got
2,000 people to learn statistics!
“
”-Jeff Leek
7/17/12
12. date: 7/19/12
from: jtleek@gmail.com
Roger let me know you gave him a
ballpark figure for the number of
students registered for his course
"Computing for Data Analysis”. Could
you give me an idea of how many have
registered for my course "Data
Analysis?”
16. Data Science Specialization
Total Enrollments: 3,815,890
Total Completions: 409,712
Genomic Data Science Specialization
Total Enrollments: 173,495
Total Completions: 10,826
Executive Data Science
Specialization
Total Enrollments: 62,076
43. We take a random sample of individuals in a
population and identify whether they smoke
and if they have cancer. We observe that there
is a strong relationship between whether a
person in the sample smoked or whether they
have lung cancer. We claim that smoking is
related to lung cancer in the larger population.
45. We take a random sample of individuals in a
population and identify whether they smoke
and if they have cancer. We observe that there
is a strong relationship between whether a
person in the sample smoked or whether they
have lung cancer. We claim that smoking is
related to lung cancer in the larger population.
We explain we think that the reason for this
relationship is because cigarette smoke
contains known carcinogens such as benzene,
which make cells in lungs become cancerous.
54. E[Claim | F0(1set(base)(A))]
-
E[Claim | F0(1set(ggplot2)(A))]
Population
Question
Hypothesis
Experimental Design
Experimentor
Data
Analysis Plan
Analyst
Code
Estimate
Claim
55. 1. Make a plot that answers the question: what is the
relationship between mean covered charges
(Average.Covered.Charges) and mean total payments
(Average.Total.Payments) in New York?
2. Make a plot (possibly multi-panel) that answers the
question: how does the relationship between mean
covered charges (Average.Covered.Charges) and mean
total payments (Average.Total.Payments) vary by
medical condition (DRG.Definition) and the state in which
care was received (Provider.State)?
Use only the [ggplot2/base R] graphics system (not
base R or lattice) to make your figure.
56. “Does the plot clearly show the
relationship between mean covered
charges (Average.Covered.Charges)
and mean total payments
(Average.Total.Payments) in New
York?”
G: 5/22 (23%) vs. B: 5/12
(42%)
57. “Does the plot clearly show the relationship
between mean covered charges
(Average.Covered.Charges) and mean total
payments (Average.Total.Payments) vary by
medical condition (DRG.Definition) and the state
in which care was received (Provider.State)?”
G: 7/22 (32%) vs. B: 5/12 (42%)
58. “Is the plot visually pleasing?”
G: 21/22 (95%) vs. B: 10/12 (83%)
G: 20/22 (91%) vs. B: 8/12 (67%)
59. “Do the plot text and labels use full
words instead of abbreviations?”
G: 21/22 (95%) vs. B: 12/12 (100%)
G: 11/22 (50%) vs. B: 5/12 (42%)