Inferential statistics quantitative data - anova

Inferential Statistics
Quantitative Data
ANOVA
Dhritiman Chakrabarti
Assistant Professor,
Dept of Neuroanaesthesiology
and Neurocritical Care,
NIMHANS, Bangalore

What is ANOVA
• ANOVA is a statistical test based on comparison of
variances within all groups to variances between groups.
• It is conducted using F-test, based on F-distribution.
• It involves partitioning the total variance into group
variance and error variance and then taking ratio of those
to compute F-statistic, which is then looked up for
appropriate degrees of freedom for comparing with
critical F for rejecting null hypothesis.
• Error variance is calculated by taking a mean of variances
of all groups combined.
• Group variance is calculated by taking a mean of variances
of all group means from the grand mean and multiplying by
number of observations per group.
• F statistic = Group variance/Error variance.
• Df error = N-1, Df groups = No. of groups -1
• Look up probability in F-table for accepting or rejecting
null hypothesis.

• Don’t worry if you don’t understand the above jargon.
You don’t need it for practical analysis.
• Simplistically, it is used to provides inferences for
difference of means of two or more groups with by
analyzing their variance, and with more
conservativeness than t-tests.

Types of ANOVA
• One way ANOVA
• Factorial ANOVA  2-way, 3 way …
• Repeated measures ANOVA
• Mixed Models ANOVA
• ANCOVA
• MANOVA
• For sake of simplicity we will stick to one way,
factorial, RM, MM.

One way ANOVA on SPSS
Assumptions:
1. Dependent variable that is continuous and Independent variable
that is categorical
2. Independent samples/groups (i.e., independence of observations)
3. Random sample of data from the population
4. Normal distribution (approximately) of the dependent variable
for each group (i.e., for each level of the factor)
1. Non-normal population distributions, especially those that are thick-tailed or
heavily skewed, considerably reduce the power of the test
2. Among moderate or large samples, a violation of normality may yield fairly
accurate p values
5. Homogeneity of variances (Levene’s test)
1. When this assumption is violated and the sample sizes differ among groups,
the p value for the overall F test is not trustworthy. These conditions warrant
using alternative statistics that do not assume equal variances among
populations, such as the Browne-Forsythe or Welch statistics.
2. When this assumption is violated, the results may not be trustworthy for post
hoc tests. When variances are unequal, post hoc tests that do not assume
equal variances should be used (e.g., Dunnett’s C).

How to on SPSS
• Data should be set up as two columns  Group column
and variable column. Group should be coded
numerically.
• Ideally residuals should be tested for normality. But
most researchers just test the variable itself for
normality.
• No guidelines on which to do  We’ll go for the
easier approach and test the variables for normality.
• Split by group  test using Shapiro Wilk.
• Most statisticians are of the opinion that ANOVA is
quite robust against departures from normality, thus
alpha level may be kept at 0.01 for normality testing.

• Go to Analyze  Compare means  One way ANOVA.
• Put the test variable in the “Dependent List” and
Grouping variable in “Factor”  Click “Post hoc” tab
 Click either Tukey or Bonferroni (in equal variances
assumed) and Either Games-Howell or Dunnett’s C (in
equal variances not assumed)  Continue.

• Click “Options” tab  Click all as shown below.
• Homogeneity of Variance is just Levene’s test for
equality of variances (an assumption for ANOVA)
• Brown-Forsythe and Welch are just corrections for
when variances are found unequal by Levene’s.
• Click Continue  OK.

Output
P-value for Levene’s test. > 0.05 means variances are equal
P-value for ANOVA. < 0.05 means some group mean
is different from the rest
F-statistic
Group variance
Error variance

Since, Variances of groups were equal,
no need for correction. But if Levene’s
test was significant, then either of
these P-values needs to be used rather
than traditional ANOVA
ANOVA only tells if there is
any difference between all
the groups. Does not tell
which groups are significantly
diff. That’s why Post-hocs
are done
Post hoc for equal variances
Post hoc for unequal
variances

2-way ANOVA on SPSS
• Two way ANOVA is used, when there are two grouping variables
which may influence the variance of dependent variable in each
other. For example SBP being measured between 4 age groups, in
which Sex may be a confounder.
• Factorial Nomenclature: X*Y Factorial  Number of factors =
2; Levels in each factor = X,Y. Our example 2*4 Factorial.
• What it tells you:
1. Does Age influence SBP (independent of Sex) ? (main effect)
2. Does Sex influence SBP (independent of Age) ? (main effect)
3. Age*Sex Interaction  Whether the trend of change of SBP
over the Age groups vary between male and female.

• Assumptions similar to ANOVA.
• Data organized in three columns  Age groups
(numerically coded), Sex (numerically coded), SBP.

• Go to Analyze  General Linear Model  Univariate
• Transfer the variable of interest into “Dependent
Variable”, Grouping variables in “Fixed Factors”.
• “Plots” tab  Put group with more levels as
“horizontal axis” and other as “separate lines”  Add
 Continue.

• “Post Hoc” tab transfer grouping variables to “Post Hoc tests
for” box  check either Bonferroni, Tukey, SNK or Scheffe 
Continue. (This is post hocs for observed means  same as one
way ANOVA)
• In “Options” tab  Transfer grouping variables and interaction
term to “Display Means for”, check “Compare main effects”, and
put a post hoc as Bonferroni  Check “Display” options as shown
below  Continue  OK. (here post-hoc is for marginal means)

Output
Shows labels
Observed Descriptives
Homogeneity of variances for full factorial model

P-value for main effects of Age/Sex  Interpret same as ANOVA while accounting
for influence of the other grouping variable.
P-value for Age*Sex  Whether the trend/direction of change in SBP over
different age groups was different in males vs females  Look at plot for easier
interpretation.
Partial Eta Squared  How much variance in SBP is being explained by the given
factor.
Adjusted R Squared  How much variance in SBP is being explained by the whole
model.
Effect size

Estimated Marginal Means
These are estimated marginal means  the
descriptives that you report in factorial
ANOVA (debatable; some prefer reporting
raw data – estimated means are more
important for ANCOVA)  these are
adjusted for variances due to Sex variable.
Pairwise comparisons of
marginal means after
Bonferroni correction.

These are estimated
marginal means  the
descriptives that you
report in factorial ANOVA
(debatable; some prefer
reporting raw data –
estimated means are more
important for ANCOVA)
these are adjusted for
variances due to Age
variable.
Pairwise comparisons of
marginal means – Only two
levels  No correction
required.

Post hocs for observed
means
Plot of estimated marginal
means of SBP at various
Age levels vs. the other
grouping variable – Sex.

Repeated Measures ANOVA
• Repeated measure ANOVA is equivalent one way ANOVA when
there is correlations between levels of the grouping variable.
The most common application is variables measured over time in
the same patient. As paired t-test is to independent samples t-
test, RMANOVA is to 1-Way ANOVA.
• Assumptions same as ANOVA (except that the samples are
independent)
• An extra assumption is that of Sphericity  The variances of
difference between successive variable levels should be roughly
equal. Tested by Mauchly’s test of Sphericity – akin to Levene’s
test in one-way ANOVA.
• Just as Brown-Forsythe/Welch corrections are used for
violation of Levene’s, Greenhouse-Geisser or Huyhn-Feldt are
used for Mauchly’s violation.

How to do RMANOVA in SPSS
• Import data into SPSS.
• Time points/Paired data should be in separate columns.
• Analyze  General Linear Model  Repeated measures  Enter
no. of time points 

• Specify time points  Add a plot  Set a contrast – mostly
simple/difference  in “Options” set “Display means for” and post hoc.

• Set “Display” of Descriptives and
Homogeneity tests  Run the test.

Results of RMANOVA
Descriptive Statistics
a
Mean Std. Deviation N
T1 14.80 3.468 15
T2 14.40 2.849 15
T3 14.27 2.890 15
T4 24.93 2.658 15
T5 26.07 3.127 15
T6 42.13 6.589 15
T7 39.73 7.255 15
T8 38.07 6.239 15
T9 77.00 9.150 15
T10 76.07 7.796 15
T11 75.67 7.118 15
T12 105.53 7.140 15
T13 106.53 8.667 15
T14 126.07 9.067 15
T15 127.20 8.073 15
a. Gp = Exp
Mauchly's Test of Sphericity
b,c
Measure:MEASURE_1
Within
Subjects
Effect Mauchly's W
Approx. Chi-
Square df Sig.
Epsilon
a
Greenhouse-
Geisser Huynh-Feldt Lower-bound
Time .000 160.619 104 .002 .406 .717 .071
Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is
proportional to an identity matrix.
a. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the
Tests of Within-Subjects Effects table.
b. Gp = Exp
c. Design: Intercept
Within Subjects Design: Time
Significant Mauchly’s

Contrast is basically comparison without compensating for elevated alpha error
by post hoc corrections. It is akin to multiple paired t-tests.

Mixed models ANOVA with within and
between subjects factors
• This model is used when we have within subjects
factor (correlated grouping variable) and between
subjects factor (uncorrelated grouping variable), for
the same dependent variable.
• Such as comparison of SBP of Sex (male/female)
when measured over time points before and after an
intervention. Sex is between subjects variable and
Time is within subjects variable.
• If we are interested in only changes in SBP over time,
separately in males and females, we can run two
separate RMANOVAs
• But if we want to see whether the change in SBP over
time was different between males and females, we
need a mixed models ANOVA.

How to on SPSS
• Data needs to be arranged in separate columns for
group, variable 1, variable 2 and so on.
• Descriptives and normality can be examined by
splitting the file using grouping variable and
conducting tests as previously taught.
• Assumptions same as RMANOVA.

• Go to Analyze  General Linear Model  Repeated measures.
• Within subject factor entry same as RMANOVA.
• In “Between-Subjects Factor” insert the grouping variable
• “Plots”  Within subjects variable as “Horizontal axis”, grouping
variable as “Separate lines”  Add  Continue.
• “Post Hoc” – Here is for between subjects variable, since here
only 2 level variable, Sex, is there, we wont be using it.

• “Options”  same as factorial or RMANOVA.

Output
Coding of variables
Ignore multivariate tests and Box’s test for covariance matrices

If GGε > 0.75, use Huyhn Feldt,
else use GG correction in within
subjects factor.
If p-value < 0.05, use
correction else sphericity
assumed in W-S factor.
P-value for overall RMANOVA of all
subjects – Unimportant
P-value for Time*Group interaction 
Shows difference of trend of change of
SBP over time.

Shows pair difference without correction
Shows difference overall between groups – This and interaction term p-value are
most important for reporting.
Ignore Levene’s test here.

Skip the Estimated Marginal means for both within and between subjects
variables. Unimportant- And no explanation for how SE for these are calculated.
But results being almost same as RMANOVA for within group variable and One
way ANOVA for between subjects variable.
EMM for Interaction effect
gives means for each level of
dependent variable. Helps to
analyze the interaction effect.
Profile plot of EMM of Dep var
over time in both the groups 
helps to analyze the interaction
effect.

Non Parametric tests
• Factorials and Mixed models don’t have non-
parametric equivalents.
• For One Way ANOVA  Kruskall Wallis test
• For Repeated measures ANOVA  Friedman’s
test.
• For post hocs of Kruskall Wallis  Mann
Whitney U test with manual Bonferroni
correction.
• For post hocs of Friedman’s  Wilcoxon
Signed Rank tests with manual Bonferroni
correction.

How to on SPSS
• For all of these go to Analyze  Nonparametric
tests.
Mann-Whitney U test
Kruskall Wallis test
Wilcoxon Signed rank test
Friedman’s test
• Beyond this, you’ll know what to do. Its quite easy.
Interpretation – just look at the p-value. Nothing else.

Inferential statistics quantitative data - anova

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Inferential statistics quantitative data - anova

Similaire à Inferential statistics quantitative data - anova (20)

Plus de Dhritiman Chakrabarti

Plus de Dhritiman Chakrabarti (20)

Dernier

Dernier (20)

Inferential statistics quantitative data - anova

Notes de l'éditeur