SlideShare une entreprise Scribd logo
Stats I, II and II
Frequencies, crosstabs, correlation, ANOVA,
regression
Jodi Upton and Crina Boros
CIJ Summer 2017
The Data Ladder -- categorical
I. One type of response (yes or no)
Frequencies:
Crosstabs:
Yes 432 45.3%
No 521 54.7%
Live in Texas
Like Bush Yes No
Yes 382 200
No 125 307
The Data Ladder-- categorical
II. Two or more types of responses (race)
Frequencies:
Race
Frequency
Asian
4,766
Black
12,807
White
9,766
Hispanic
7.236
Crosstabs:
Race Warning
Ticket None
Black 1
6 0
White 4
3 1
Hispanic 0
1 2
Unknown 3
2 2
The Data Ladder-- categorical
III. Ordinal Data (use crosstabs and frequencies)
When the value doesn’t mean much, but the order
does:
Grade levels
Age categories
Income categories
The Data Ladder-- continuous data
Examples:
Income
Housing prices
Response time (police and fire)
Distance travelled (commute)
What you can do:
Mean
Median
Range
Rank
Correlation
ANOVA
Regression
Go to Kahoot.it
(on your phone or computer)
In traditional statistics, the normal curve means 95% of observations will fall within
most of this curve
Independent vs. Dependent variable
Independent
Comes first in time
Can be more than one
variable
Dependent
What you are measuring
Polling
A March 9, 2016 Quinnipiac poll
found the following results, with a
+- 3.7 margin of error, at the 95%
confidence level.
Who is really ahead?
What’s the MOE for women? White
males?
CORRELATION
AKA: Pearson’s r or coefficient of correlation
● Between 1 and -1
● If both variables move in the same direction → positive relationship
● If variables move in opposite direction → negative relationship
-1 0
+1
Strong relationship weak weak
strong
Got it, so far?
ANOVA
What it assumes:
Normal distribution
Independence of errors
Outliers removed*
Equal variance
(*but journalists love those!)
What it measures:
Whether the difference
within the group is greater
than the difference between
the groups
ANOVA needs an hypothesis
Null hypothesis: the treatment has no impact
F = the treatment variance + the random variance
the random variance
What you’re looking for:
The F statistic is between 0 and 1 (if it’s negative, you’ve
made a mistake)
If F > F crit, you must reject the null hypothesis (treatment had an impact)
If F < F crit, you can’t rule out the null hypothesis
The p value
If the p value is less than alpha (.05) then the result is significant (it matters)
If the p value is greater than alpha, the results are not significant
In Massachusetts, are there more
suicides in local jails or in the
prison system?
What you still don’t know
What accounts for the difference?
For that you need a t-test, regression or other tool.
‘HOW TO CHOOSE’ MADE EASY
THE 2 MOST ESSENTIAL QUESTIONS:
1. DO YOU HAVE CATEGORICAL OR CONTINUOUS DATA IN THE VARIABLES?
2. WHAT IS YOUR INDEPENDENT VARIABLE AND DEPENDENT VARIABLE?
INDEPENDENT DEPENDENT STATISTICS
Categorical Categorical CROSS-TAB
Continuous Continuous LINEAR-REGRESSION /
MULTIPLE REGRESSION
Categorical Continuous ANALYSIS OF VARIANCE /
ANOVA
Continuous Categorical LOGISTIC REGRESSION
iT’S A FINE DAY FOR LINEAR REGRESSION!
Image by Paul Wesley
Linear Regression
I. Does the data fit
the 1st assumption:
is there a linear relationship?
1. Scatter plot
2. Trendline
3. Create a new variable
II. The last assumption:
the data should approximate
a Bell curve (normal distribution).
1. Data analysis toolpak -
Descriptive statistics
1. Mean and average should be
close to each other
1. Tick Summary Statistics
2. Tick Confidence level >> 95%
X vs. Y
Source:
http://www.gradeamathhelp.com/x-axis-
and-y-axis.html
Source: assetinsights.net
Source: Indian Journal of Dermatology
https://tinyurl.com/ydad546c
Linear Regression
Conditions met? Run the Regression from
the Data Analysis tool pack:
Y Range - Dependant
X Range - Independent
Turn on LABELS
CONFIDENCE LEVEL 95%
NEW WORKSHEET - REGRESSION
RESIDUALS
ADJUSTED R SQUARE 0 TO 1.0. The
closer it gets to 1, the closest is to
perfection.
SIGNIFICANT F
THE RESIDUAL STORY - Sort!
THE LINEAR REGRESSION IS
JUST THE BEGINNING OF
THE REPORTING
Conrad Carlberg - Statistical
Analysis
Thank you!
Jodi Upton: jodi.upton@gmail.com and @jodiupton
Crina Boros: crinaboros@gmail.com
Special thanks to: Jennifer LaFleur, Center for Investigative
Reporting/Reveal
Steve Doig, Arizona State University

Contenu connexe

Similaire à relational Statistics - workshops 1, II, III.pptx

Answer all questions individually and cite all work!!1. Provid.docx
Answer all questions individually and cite all work!!1. Provid.docxAnswer all questions individually and cite all work!!1. Provid.docx
Answer all questions individually and cite all work!!1. Provid.docx
festockton
 
29510Nominal Data and the Chi-Square TestsJupiterimag.docx
29510Nominal Data and  the Chi-Square TestsJupiterimag.docx29510Nominal Data and  the Chi-Square TestsJupiterimag.docx
29510Nominal Data and the Chi-Square TestsJupiterimag.docx
rhetttrevannion
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11
Bonnie Green
 
The t Test for I.docx
The t Test for I.docxThe t Test for I.docx
The t Test for I.docx
christalgrieg
 
Section 1 Data File DescriptionThe fictional data represents a te.docx
Section 1 Data File DescriptionThe fictional data represents a te.docxSection 1 Data File DescriptionThe fictional data represents a te.docx
Section 1 Data File DescriptionThe fictional data represents a te.docx
bagotjesusa
 
Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)
Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)
Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)
Vaggelis Vergoulas
 
Essay On Juvenile Incarceration
Essay On Juvenile IncarcerationEssay On Juvenile Incarceration
Essay On Juvenile Incarceration
Lissette Hartman
 
Choosing a test.pptx
Choosing a test.pptxChoosing a test.pptx
Choosing a test.pptx
Muhammad Ayaz
 
Statistics  What you Need to KnowIntroductionOften, when peop.docx
Statistics  What you Need to KnowIntroductionOften, when peop.docxStatistics  What you Need to KnowIntroductionOften, when peop.docx
Statistics  What you Need to KnowIntroductionOften, when peop.docx
dessiechisomjj4
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
Nancy Garmer
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
Evans Library at Florida Institute of Technology
 
1 1 data
1 1 data1 1 data
1 1 data
Ken Kretsch
 
Overview of different statistical tests used in epidemiological
Overview of different  statistical tests used in epidemiologicalOverview of different  statistical tests used in epidemiological
Overview of different statistical tests used in epidemiological
shefali jain
 
T test
T test T test
Impact of Race and Ethnicity on Preemployment Psychological Assessment
Impact of Race and Ethnicity on Preemployment Psychological AssessmentImpact of Race and Ethnicity on Preemployment Psychological Assessment
Impact of Race and Ethnicity on Preemployment Psychological Assessment
htmleffew
 
Data screening
Data screeningData screening
Data screening
緯鈞 沈
 
Stat11t chapter1
Stat11t chapter1Stat11t chapter1
Stat11t chapter1
raylenepotter
 
Stat11t Chapter1
Stat11t Chapter1Stat11t Chapter1
Stat11t Chapter1
gueste87a4f
 
Tools of the Trade: First Generation SAT Preparation: Best Practices for Over...
Tools of the Trade: First Generation SAT Preparation: Best Practices for Over...Tools of the Trade: First Generation SAT Preparation: Best Practices for Over...
Tools of the Trade: First Generation SAT Preparation: Best Practices for Over...
Marissa Lowman
 
Basic-Statistics-in-Research-Design.pptx
Basic-Statistics-in-Research-Design.pptxBasic-Statistics-in-Research-Design.pptx
Basic-Statistics-in-Research-Design.pptx
KheannJanePasamonte
 

Similaire à relational Statistics - workshops 1, II, III.pptx (20)

Answer all questions individually and cite all work!!1. Provid.docx
Answer all questions individually and cite all work!!1. Provid.docxAnswer all questions individually and cite all work!!1. Provid.docx
Answer all questions individually and cite all work!!1. Provid.docx
 
29510Nominal Data and the Chi-Square TestsJupiterimag.docx
29510Nominal Data and  the Chi-Square TestsJupiterimag.docx29510Nominal Data and  the Chi-Square TestsJupiterimag.docx
29510Nominal Data and the Chi-Square TestsJupiterimag.docx
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11
 
The t Test for I.docx
The t Test for I.docxThe t Test for I.docx
The t Test for I.docx
 
Section 1 Data File DescriptionThe fictional data represents a te.docx
Section 1 Data File DescriptionThe fictional data represents a te.docxSection 1 Data File DescriptionThe fictional data represents a te.docx
Section 1 Data File DescriptionThe fictional data represents a te.docx
 
Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)
Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)
Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)
 
Essay On Juvenile Incarceration
Essay On Juvenile IncarcerationEssay On Juvenile Incarceration
Essay On Juvenile Incarceration
 
Choosing a test.pptx
Choosing a test.pptxChoosing a test.pptx
Choosing a test.pptx
 
Statistics  What you Need to KnowIntroductionOften, when peop.docx
Statistics  What you Need to KnowIntroductionOften, when peop.docxStatistics  What you Need to KnowIntroductionOften, when peop.docx
Statistics  What you Need to KnowIntroductionOften, when peop.docx
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
1 1 data
1 1 data1 1 data
1 1 data
 
Overview of different statistical tests used in epidemiological
Overview of different  statistical tests used in epidemiologicalOverview of different  statistical tests used in epidemiological
Overview of different statistical tests used in epidemiological
 
T test
T test T test
T test
 
Impact of Race and Ethnicity on Preemployment Psychological Assessment
Impact of Race and Ethnicity on Preemployment Psychological AssessmentImpact of Race and Ethnicity on Preemployment Psychological Assessment
Impact of Race and Ethnicity on Preemployment Psychological Assessment
 
Data screening
Data screeningData screening
Data screening
 
Stat11t chapter1
Stat11t chapter1Stat11t chapter1
Stat11t chapter1
 
Stat11t Chapter1
Stat11t Chapter1Stat11t Chapter1
Stat11t Chapter1
 
Tools of the Trade: First Generation SAT Preparation: Best Practices for Over...
Tools of the Trade: First Generation SAT Preparation: Best Practices for Over...Tools of the Trade: First Generation SAT Preparation: Best Practices for Over...
Tools of the Trade: First Generation SAT Preparation: Best Practices for Over...
 
Basic-Statistics-in-Research-Design.pptx
Basic-Statistics-in-Research-Design.pptxBasic-Statistics-in-Research-Design.pptx
Basic-Statistics-in-Research-Design.pptx
 

Dernier

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 

Dernier (20)

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 

relational Statistics - workshops 1, II, III.pptx

  • 1. Stats I, II and II Frequencies, crosstabs, correlation, ANOVA, regression Jodi Upton and Crina Boros CIJ Summer 2017
  • 2. The Data Ladder -- categorical I. One type of response (yes or no) Frequencies: Crosstabs: Yes 432 45.3% No 521 54.7% Live in Texas Like Bush Yes No Yes 382 200 No 125 307
  • 3. The Data Ladder-- categorical II. Two or more types of responses (race) Frequencies: Race Frequency Asian 4,766 Black 12,807 White 9,766 Hispanic 7.236 Crosstabs: Race Warning Ticket None Black 1 6 0 White 4 3 1 Hispanic 0 1 2 Unknown 3 2 2
  • 4. The Data Ladder-- categorical III. Ordinal Data (use crosstabs and frequencies) When the value doesn’t mean much, but the order does: Grade levels Age categories Income categories
  • 5. The Data Ladder-- continuous data Examples: Income Housing prices Response time (police and fire) Distance travelled (commute) What you can do: Mean Median Range Rank Correlation ANOVA Regression
  • 6. Go to Kahoot.it (on your phone or computer)
  • 7.
  • 8. In traditional statistics, the normal curve means 95% of observations will fall within most of this curve
  • 9.
  • 10.
  • 11. Independent vs. Dependent variable Independent Comes first in time Can be more than one variable Dependent What you are measuring
  • 12. Polling A March 9, 2016 Quinnipiac poll found the following results, with a +- 3.7 margin of error, at the 95% confidence level. Who is really ahead? What’s the MOE for women? White males?
  • 13.
  • 14. CORRELATION AKA: Pearson’s r or coefficient of correlation ● Between 1 and -1 ● If both variables move in the same direction → positive relationship ● If variables move in opposite direction → negative relationship -1 0 +1 Strong relationship weak weak strong
  • 15. Got it, so far?
  • 16. ANOVA What it assumes: Normal distribution Independence of errors Outliers removed* Equal variance (*but journalists love those!) What it measures: Whether the difference within the group is greater than the difference between the groups
  • 17. ANOVA needs an hypothesis Null hypothesis: the treatment has no impact F = the treatment variance + the random variance the random variance
  • 18. What you’re looking for: The F statistic is between 0 and 1 (if it’s negative, you’ve made a mistake) If F > F crit, you must reject the null hypothesis (treatment had an impact) If F < F crit, you can’t rule out the null hypothesis The p value If the p value is less than alpha (.05) then the result is significant (it matters) If the p value is greater than alpha, the results are not significant
  • 19. In Massachusetts, are there more suicides in local jails or in the prison system?
  • 20. What you still don’t know What accounts for the difference? For that you need a t-test, regression or other tool.
  • 21. ‘HOW TO CHOOSE’ MADE EASY THE 2 MOST ESSENTIAL QUESTIONS: 1. DO YOU HAVE CATEGORICAL OR CONTINUOUS DATA IN THE VARIABLES? 2. WHAT IS YOUR INDEPENDENT VARIABLE AND DEPENDENT VARIABLE? INDEPENDENT DEPENDENT STATISTICS Categorical Categorical CROSS-TAB Continuous Continuous LINEAR-REGRESSION / MULTIPLE REGRESSION Categorical Continuous ANALYSIS OF VARIANCE / ANOVA Continuous Categorical LOGISTIC REGRESSION
  • 22. iT’S A FINE DAY FOR LINEAR REGRESSION! Image by Paul Wesley
  • 23. Linear Regression I. Does the data fit the 1st assumption: is there a linear relationship? 1. Scatter plot 2. Trendline 3. Create a new variable II. The last assumption: the data should approximate a Bell curve (normal distribution). 1. Data analysis toolpak - Descriptive statistics 1. Mean and average should be close to each other 1. Tick Summary Statistics 2. Tick Confidence level >> 95%
  • 26. Source: Indian Journal of Dermatology https://tinyurl.com/ydad546c
  • 27. Linear Regression Conditions met? Run the Regression from the Data Analysis tool pack: Y Range - Dependant X Range - Independent Turn on LABELS CONFIDENCE LEVEL 95% NEW WORKSHEET - REGRESSION RESIDUALS ADJUSTED R SQUARE 0 TO 1.0. The closer it gets to 1, the closest is to perfection. SIGNIFICANT F THE RESIDUAL STORY - Sort! THE LINEAR REGRESSION IS JUST THE BEGINNING OF THE REPORTING Conrad Carlberg - Statistical Analysis
  • 28.
  • 29. Thank you! Jodi Upton: jodi.upton@gmail.com and @jodiupton Crina Boros: crinaboros@gmail.com Special thanks to: Jennifer LaFleur, Center for Investigative Reporting/Reveal Steve Doig, Arizona State University

Notes de l'éditeur

  1. Starting at the bottom...
  2. Heard continuous referred to as ‘infinite’ but it’s really not. Income and prices, for example, are limited to two decimal places. Another way that may help: Continuous (measured) vs discrete (counted) If it would take ‘forever’ to count, it’s probably continuous
  3. Amy Poehler
  4. Skew = body (positive or negative); kurtosis = tail
  5. All you really need to know: is my data evenly distributed
  6. The margin of error does not depend on the size of the population; it depends on the size of the sample. (In astronomy, the margin of error is 4.12 light years -- the distance to Proxima Centauri)
  7. Important: 1 in 20 observations will NOT! Bayesian: start with a different hypothesis, 100% within curve, but may be ‘off’
  8. ANOVA was created by an evolutionary biologist and statistician, who wanted to be able to tell if two groups were the same or different, ie were they the same species or not?
  9. In other words, is there enough randomness within the sample, that it outweighs any variance between the samples -- and any measured difference is the result of chance?
  10. “F” stands for Sir Reginald Fisher, who invented this “P” stands for the probability that -- if the null hypothesis is true -- the results are ‘extreme’ (one in 500 chance of being wrong)
  11. A paper ‘suicide suit’ worn by a model
  12. ANOVA was created by an evolutionary biologist and statistician, who wanted to be able to tell if two groups were the same or different, ie were they the same species or not?