relational Statistics - workshops 1, II, III.pptx

Stats I, II and II
Frequencies, crosstabs, correlation, ANOVA,
regression
Jodi Upton and Crina Boros
CIJ Summer 2017

The Data Ladder -- categorical
I. One type of response (yes or no)
Frequencies:
Crosstabs:
Yes 432 45.3%
No 521 54.7%
Live in Texas
Like Bush Yes No
Yes 382 200
No 125 307

The Data Ladder-- categorical
II. Two or more types of responses (race)
Frequencies:
Race
Frequency
Asian
4,766
Black
12,807
White
9,766
Hispanic
7.236
Crosstabs:
Race Warning
Ticket None
Black 1
6 0
White 4
3 1
Hispanic 0
1 2
Unknown 3
2 2

The Data Ladder-- categorical
III. Ordinal Data (use crosstabs and frequencies)
When the value doesn’t mean much, but the order
does:
Grade levels
Age categories
Income categories

The Data Ladder-- continuous data
Examples:
Income
Housing prices
Response time (police and fire)
Distance travelled (commute)
What you can do:
Mean
Median
Range
Rank
Correlation
ANOVA
Regression

Go to Kahoot.it
(on your phone or computer)

In traditional statistics, the normal curve means 95% of observations will fall within
most of this curve

Independent vs. Dependent variable
Independent
Comes first in time
Can be more than one
variable
Dependent
What you are measuring

Polling
A March 9, 2016 Quinnipiac poll
found the following results, with a
+- 3.7 margin of error, at the 95%
confidence level.
Who is really ahead?
What’s the MOE for women? White
males?

CORRELATION
AKA: Pearson’s r or coefficient of correlation
● Between 1 and -1
● If both variables move in the same direction → positive relationship
● If variables move in opposite direction → negative relationship
-1 0
+1
Strong relationship weak weak
strong

ANOVA
What it assumes:
Normal distribution
Independence of errors
Outliers removed*
Equal variance
(*but journalists love those!)
What it measures:
Whether the difference
within the group is greater
than the difference between
the groups

ANOVA needs an hypothesis
Null hypothesis: the treatment has no impact
F = the treatment variance + the random variance
the random variance

What you’re looking for:
The F statistic is between 0 and 1 (if it’s negative, you’ve
made a mistake)
If F > F crit, you must reject the null hypothesis (treatment had an impact)
If F < F crit, you can’t rule out the null hypothesis
The p value
If the p value is less than alpha (.05) then the result is significant (it matters)
If the p value is greater than alpha, the results are not significant

In Massachusetts, are there more
suicides in local jails or in the
prison system?

What you still don’t know
What accounts for the difference?
For that you need a t-test, regression or other tool.

‘HOW TO CHOOSE’ MADE EASY
THE 2 MOST ESSENTIAL QUESTIONS:
1. DO YOU HAVE CATEGORICAL OR CONTINUOUS DATA IN THE VARIABLES?
2. WHAT IS YOUR INDEPENDENT VARIABLE AND DEPENDENT VARIABLE?
INDEPENDENT DEPENDENT STATISTICS
Categorical Categorical CROSS-TAB
Continuous Continuous LINEAR-REGRESSION /
MULTIPLE REGRESSION
Categorical Continuous ANALYSIS OF VARIANCE /
ANOVA
Continuous Categorical LOGISTIC REGRESSION

iT’S A FINE DAY FOR LINEAR REGRESSION!
Image by Paul Wesley

Linear Regression
I. Does the data fit
the 1st assumption:
is there a linear relationship?
1. Scatter plot
2. Trendline
3. Create a new variable
II. The last assumption:
the data should approximate
a Bell curve (normal distribution).
1. Data analysis toolpak -
Descriptive statistics
1. Mean and average should be
close to each other
1. Tick Summary Statistics
2. Tick Confidence level >> 95%

X vs. Y
Source:
http://www.gradeamathhelp.com/x-axis-
and-y-axis.html

Source: Indian Journal of Dermatology
https://tinyurl.com/ydad546c

Linear Regression
Conditions met? Run the Regression from
the Data Analysis tool pack:
Y Range - Dependant
X Range - Independent
Turn on LABELS
CONFIDENCE LEVEL 95%
NEW WORKSHEET - REGRESSION
RESIDUALS
ADJUSTED R SQUARE 0 TO 1.0. The
closer it gets to 1, the closest is to
perfection.
SIGNIFICANT F
THE RESIDUAL STORY - Sort!
THE LINEAR REGRESSION IS
JUST THE BEGINNING OF
THE REPORTING
Conrad Carlberg - Statistical
Analysis

Thank you!
Jodi Upton: jodi.upton@gmail.com and @jodiupton
Crina Boros: crinaboros@gmail.com
Special thanks to: Jennifer LaFleur, Center for Investigative
Reporting/Reveal
Steve Doig, Arizona State University

relational Statistics - workshops 1, II, III.pptx

Recommandé

Recommandé

Contenu connexe

Similaire à relational Statistics - workshops 1, II, III.pptx

Similaire à relational Statistics - workshops 1, II, III.pptx (20)

Dernier

Dernier (20)

relational Statistics - workshops 1, II, III.pptx

Notes de l'éditeur