1. Statistical tests for categorical data
Dr. S. A. Rizwan, M.D.
Public Health Specialist
SBCM, Joint Program – Riyadh
Ministry of Health, Kingdom of Saudi Arabia
2. Learning objectives
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Examine the relationship between categorical
variables
• Construct a contingency table for two categorical
variables
• Describe the approach to statistical testing of
categorical variables
3. Revise: Categorical variables
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Categorical (qualitative)
• Nominal (no order)
• Dichotomous, binary, binomial
• Polychotomous
• Ordinal (ordered)
• Answers “what?”
• Qualitative data is categorised
5. Revise: Prerequisites for a test
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• How many variables are there?
• What is the nature of dependent and
independent variable?
• How many categories are there in the
categorical variable?
• Does the continuous variable follow normal
distribution?
• Is there any pairing in the data/variables?
7. Statistical tests: Bivariate
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
For unpaired data For paired data
• If assumptions for Chi square are met
• Chi-square (>= 2 levels)
• If assumptions for Chi square NOT met
• Fisher’s exact (>= 2 levels)
• If the groups are paired
• McNemar (if 2 levels)
• RM logistic regression (if >2 levels)
• Interrater reliability analysis
8. Statistical tests: Multivariate
SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
For unpaired data For matched data
Demystifying statistics!
• If DV is binary and >1 IV
• Binary logistic regression
• If DV is polychotomousand >1 IV
• Multinomial logistic regression
• If DV is ordinal and >1 IV
• Ordinal regression
• If the groups are matched
• Conditional logistic regression
• If repeated measurements
• RM logistic regression
10. Statistical tests: Special
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
For ordered categorical variable
• Chi square test for trend
Passed Failed Total
R1 100 78 178
R2 175 173 348
R3 42 59 101
Total 317 310 627
12. Contingency table
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• Used in bivariate situations
• Use counts, not percentages
• No one-sided tests
• Each subject counted only once
• Explain significant findings
13. Some selected topics
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• Covered in other classes
• Chi square test
• Cochran-Mantel-Haenszel test
• Regression
• In this class we will cover basics of:
• Fisher’s exact test
• McNemar test
• Interrater reliability analysis (Agreement statistics)
14. Thought exercise 1
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• In a study a researcher tested a perfume on 9 rats and used water as
the control on 9 other rats. Among the perfume group 1 rat showed
restlessness whereas among the control group 4 rats showed
restlessness. Determine if there is an association between perfume
and restlessness.
15. Thought exercise 2
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• 22 pairs of twins were enrolled in the study. One of the twins
smoked, the other didn’t. The twins were followed to see which twin
died first. For 17 pairs of twins, the smoking twin died first and for 5
pairs of twins, the non-smoking twin died first.
16. Thought exercise 3
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• All 100 pathological slides were observed by 2 pathologists. The
were supposed to classify the disease as mild, moderate and severe.
Pathologist 1 classified 60, 30, 10 and pathologist 2 classified 50, 30,
20 as mild, moderate and severe. Both pathologists agreed that 44
were mild, 20 were moderate and 6 were severe and disagreed on
the remaining slides. Calculate the agreement between the two
pathologists.
17. Fisher’s exact test
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• Used in the place of chi square
test for independence when the
cell counts are sparse
• More than 20% of the cells have
expected frequencies of <5
19. Fisher’s exact test
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• 6 possible tables for the observed
marginal totals: 9, 9, 5, 13.
• p-value is calculated by summing
all probabilities less than or equal
to the probability of the observed
table
20. Fisher’s exact test
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• The observed table (Table II) has
probability = 0.132
• P-value for the Fisher’s exact test =
Pr (Table II) + Pr (Table V) + Pr
(Table I) + Pr (Table VI)
• = 0.132 + 0.132 + 0.0147 + 0.0147
= 0.293
21. McNemar test
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• When data are paired and the outcome of interest is a proportion,
the McNemar Test is used
• Pair-Matched data can come from
• Case-control studies where each case has a matching control
(matched on age, gender, race, etc.)
• Twins studies – the matched pairs are twins
• Before - After data
• Outcome is presence (+) or absence (-) of some characteristic
measured on the same individual at two time points
22. McNemar test: matched case-control
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• a - number of case-control pairs where both are exposed
• b - number of case-control pairs where the case is exposed and the
control is unexposed
• c - number of case-control pairs where the case is
• unexposed and the control is exposed
• d - number of case-control pairs where both are unexposed
• The counts in the table for a case-control study are numbers of pairs
not numbers of individuals.
23. McNemar test: before-after study
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• a - number of subjects with characteristic present both
before and after treatment
• b - number of subjects where characteristic is present
before but not after
• c - number of subjects where characteristic is present
after but not before
• d - number of subjects with the characteristic absent
both before and after treatment.
24. McNemar test
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• Calculated using the counts in the ‘b’ and
‘c’ cells of the table
• The sampling distribution Chi-square
distribution, the degrees of freedom = 1
• For a test with alpha = 0.05, the critical
value for the McNemar statistic = 3.84.
26. McNemar test
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• Critical value for Chi-square
distribution with 1 df = 3.84, p
value = 0.01
• Conclusion: A significantly different
proportion of smoking twins died
first compared to their non-
smoking twin indicating a different
risk of death associated with
smoking (p = 0.01)
27. Agreement statistics
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• Many types of agreement statistics depending on
• Data type
• Type of repetition
• Internal consistency
28. Agreement statistics
SBCM, Joint Program – RiyadhSBCM, Joint Program – RiyadhDemystifying statistics!
• Cohen’s kappa
• Measures the agreement between
two raters who each classify N
items into C mutually exclusive
categories
• Used when responses are
categorical
31. Advanced learning
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Chi square test for trend
• Special cases of logistic regression
• Repeated measures logistic regression
• Weighted kappa
• Other measures of agreement analysis
32. Take home messages
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Many approaches are available for analysing categorical data
• Choose a method appropriate for your problem
• Check that the assumptions of the method are valid
• Make conclusions based on the results of the test