Statistics pres 3.31.2014

A D L T 6 7 3 : T E A C H I N G A S S C H O L A R S H I P I N
M E D I C A L E D U C A T I O N
M O N D A Y , M A R C H 3 1 , 2 0 1 4
An Overview of Quantitative
Data Analysis

Outline of Today’s Class
 Analytic Methods
 Summary Measures
 Hypothesis Testing
 Statistical Methodologies
 Group Discussion
 Sample Size Determination
 Group Discussion
 Additional Resources

Analytic Methods: Summary Measures
 Representative Measures
 Reflect the most “typical” or “average” data value.
 Continuous Measurements:
 Mean (Average), Median and Mode
 Categorical Measurements:
 Frequencies and Proportions

 Measures of Variability
 Reflect how much the values differ from one another.
 Continuous Measurements:
 Standard deviation, range, interquartile range
 Categorical Measurements:
 None that are meaningful (sorry!)

“Normally” Distributed
Data
“Skewed” Data

 Measures of Association
 Continuous Measures: Correlation Coefficient (ρ): -1 < ρ < 1
 Correlations close to 1 indicate two measurements are highly
predictive and “track” with one another.
 Correlations close to -1 indicate two measurements are highly
predictive and have inverse relationship.
 Correlations close to 0 indicate little association.
 Categorical Measures: Odds Ratio (OR): 0 < OR < ∞
 OR greater than 1 indicates outcome (e.g., passed test) more likely
in test group than in control.
 OR less than 1 indicates outcome less likely in test group than in
control.
 OR ≈ 1 indicates little difference in outcomes between groups.

Analytic Methods: Hypothesis Testing
 Most commonly accepted format of providing
quantitative evidence.
 Consists of 5 Steps:
 Translate research question into a set of testable hypotheses.
 Select most appropriate statistical test for your hypotheses.
 Collect your data.
 Calculate test statistic and/or p-value.
 Make Decision.

 Translating Research Question into Testable Hypotheses
 Identify parameter: population Mean (μ), proportion (p) or
difference (e.g., μ1-μ2).
 Identify statements made about that parameter.
 Should be in the form of: <, ≤, >, ≥, = or ≠
 Write research question in symbolic form, and find its opposite.
 Opposite of “<“ is “≥”
 “≤” is opposite of “>”
 “≠” is opposite of “=“

 Example:
 Does an active learning curriculum improve the proportion of
students passing their board examinations compared to
students receiving the standard curriculum?
 Parameter: proportion passing board exams  p
 Statement: pactive is greater than pstandard
 Symbolic Form: pactive > pstandard or pactive – pstandard > 0
 Opposite of Symbolic Form: pactive ≤ pstandard or pactive – pstandard ≤ 0

 Testable Hypotheses:
 Null Hypothesis: Statement that parameter (or difference) is equal
to zero.
 Any statement in symbolic form with a ≤, ≥ or = is automatically the
null (note: we replace ≤ or ≥ with 0).
 Alternative Hypothesis: Statement that parameter (or difference) is
somehow different from zero.
 Any statement in symbolic form with a <, > or ≠ is automatically the
alternative.
 Example:
 pactive – pstandard > 0  becomes the alternative (HA)
 pactive – pstandard ≤ 0  becomes the null (H0)

 Make Decision
 Based on statistical methodology you use, you get a p-value.
 Probability of observing outcomes that are more extreme than the
data you actually observed, given the null hypothesis is true.
 Plain English: If your study was ineffective, p-value is the probability
of observing more extreme results than what you observed.
 If this probability is high, then your results match with the null
hypothesis, and you fail to reject the null (intervention didn’t work)
 If this probability is low, then your results do not seem to match the
null hypothesis, and you reject the null (intervention likely
worked).
 In practice: we compare p-value to significance level (α = 0.05).
 If p-value ≥ 0.05, we fail to reject the null.
 If p-value < 0.05, we reject the null.

Analytic Methods: Continuous Data
# of Measurements
# of Samples Single Pre/Post Repeated Measures
1 Sample t-test Paired t-test Repeated Measures ANOVA
(RMA) / Linear Mixed
Model (LMM)*
2 Samples Two-sample
t-test
RMA / LMM* RMA / LMM*
“k” Samples Analysis of
Variance
(ANOVA)
RMA / LMM* RMA / LMM*
Adjusting for
Covariates:
Multiple Linear Regression*, Analysis of Covariance
(ANCOVA)*, Linear Mixed Models*
*Will likely require statistical assistance

Analytic Methods: Categorical Data
# of Measurements
# of Samples Single Pre/Post Repeated Measures
1 Sample z-test McNemar’s
Test
Generalized Linear
Mixed Models (GLMM)*
2 Samples Chi-square
Test
GLMM* GLMM*
“k” Samples Chi-square
Test
GLMM* GLMM*
Adjusting for
Covariates:
Multiple Logistic Regression*, Generalized Linear
Mixed Models*
*Will likely require statistical assistance

Analytic Methods: Group Discussion
 Please break into groups by table
 For the next 10-15 minutes, take turns discussing what
analytic approaches are appropriate for your proposed study.
 What are your null and alternative hypotheses?
 Is your outcome continuous or categorical?
 How many groups and measurements?
 If your study is qualitative, discuss how statistical
methodologies could be used (e.g. data summary,
association).

Sample Size Determination
 As a general rule, larger sample sizes:
 Lead to more representative samples
 Lead to better estimation of parameters (e.g., representative
measures)
 Provide estimators with lower variability
N=9 N=36 N=100

Averages over 10,000 Simulations
Sample Size Sample
Mean
Sample Std.
Dev.
Standard
Error*
9 204.4 36.5 12.3
16 204.3 37.1 9.5
25 204.2 37.2 7.8
36 204.1 37.5 6.5
49 204.1 37.6 5.5
64 204.2 37.7 4.9
81 204.1 37.7 4.2
100 204.1 37.7 3.9
1000 204.1 37.7 1.2
*SE: explains variability in estimator; not the sample data

 Possible Decisions
 Power = 1 - β
True State
Decision H0 is “True” HA is True
Reject H0 Type I Error
α
Correct
Decision
Fail to Reject H0 Correct
Decision
Type II Error
β

 Determinants of Required Sample Size
 Significance Level (α): probability of rejecting H0 when it is
true.
 Power (1-β): probability of failing to reject H0 when it is false.
 These values are selected during design phase
 α = 5%
 1-β = 80% (sometimes 90%).

 Measure of variability (usually standard deviation) inherent
in study population.
 As data become more variable…
 Standard error of Test statistic increases…
 p-value increases…
 Ability to reject H0 decreases…
 Power decreases.
 Controlling variability:
 Better measurement methodology
 Homogeneous samples

 Effect Size: smallest difference or change in outcome that you are
hoping to find
 As difference you want to observe decreases…
 Test statistic decreases…
 p-value increases…
 Ability to reject H0 decreases…
 Power decreases.
 Considerations:
 Clinical significance
 Clinical possibility (larger differences are easier to detect and harder
to find)

 Calculating Required Sample Size
 Equations exist (involving α, β, variability and effect size) for
simple analytic methods (t-test, chi-square, etc.).
 Advanced methods require professional assistance.
 Where do you find variability and effect size?
 Previous literature of similar populations
 Pilot study
 Guess-timates

 What if required sample size is too large?
 Consider a different outcome
 Continuous measures generally require smaller sample sizes than
categorical measures
 Consider multiple sections or sites
 Will require more sophisticated analytic methods
 Reconfigure study as a “pilot”
 Emphasis switches from “hypothesis testing” to “estimation” and
“data summary”
 Goal is to provide data summaries and estimate confidence intervals
 Summaries can be used to power larger study

Sample Size Determination: Group Discussions
 Please break into groups by table.
 For the next 10-15 minutes, take turns discussing:
 Whether you will be able to power your study.
 Where to find information to perform power analysis.
 Your options if you are unable to adequately power your study.

Additional Resources
 VCU Department of Biostatistics
 18 full-time faculty
 Can assist with: study design, sample size
determination, interim and final analyses, dissemination
 Grant funding (or prospects of funding) usually required.
 BIOS 516 Biostatistical Consulting: graduate students available
for FREE consultations
 Contact Russ Boyle (boyle@vcu.edu) and provide a protocol.

 VCU Center for Clinical and Translation Research
 Research Incubator: study design, sample size determination,
and other resources (e.g. grant writing)
 Contact: Pam Dillon (pmdillon@vcu.edu)
 Biomedical Informatics: data management and storage (e.g.
REDCAP)
 Support requested online:
(http://www.cctr.vcu.edu/informatics/index.html)

 Textbooks (i.e., shameless plug):
 Statistical Research Methods: A Guide for Non-Statisticians
 Sabo and Boone, Springer, 2013
 Available on the web ($45-$65):
 http://www.springer.com/statistics//life+sciences,+medicine
+%26+health/book/978-1-4614-8707-4
 http://www.amazon.ca/Statistical-Research-Methods-Guide-
Non-Statisticians/dp/1461487072

Statistics pres 3.31.2014

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (12)

Similar to Statistics pres 3.31.2014

Similar to Statistics pres 3.31.2014 (20)

More from tjcarter

More from tjcarter (20)

Statistics pres 3.31.2014