2. Independent
variable = the
cause
Dependent
variable = the
effect
The researcher
controls or
manipulates the
independent
variable (the
treatment)
The dependent
variable is what is
measured, often
called the
assessment
(knowledge, skills
or attitudes).
Testing Hypotheses
3. A Simple Hypothesis : The
treatment (independent variable) improves
students on the assessment (dependent variable).
Three possible major problems related to causality:
1. The assessment was not measured well
2. The treatment was not manipulated well
3. Something other than the treatment caused change in
the assessment (internal validity).
4. Construct and Internal
Validity
0Construct Validity:
Am I measuring what I think I am
measuring?
Am I implementing what I think I am
implementing?
0Internal Validity: Did the treatment cause
the outcome?
5. A Simple Hypothesis : The
treatment (independent variable) improves
students on the assessment (dependent variable).
Three possible major problems related to causality:
1. The assessment was not measured well (reliability and
construct validity)
2. The treatment was not manipulated well (construct
validity)
3. Something other than the treatment caused change in
the assessment (internal validity).
6. A study does not have absolute
validity or absolutely no validity
The level of validity relates to the
confidence in the conclusions
Construct and internal validity are
measured on a continuum
Construct validity does not imply
internal validity (and vice versa)
When a hypothesis is supported, it
does not necessarily mean that the
study has either construct or internal
validity
Some notes
on
evaluating
construct
and internal
validity
7. What is meant by “construct”?
0 A concept, model, or schematic idea
0 A construct is the global notion of the measure, such as:
0 Student motivation
0Intelligence
0 Student learning
0 Student anxiety
0 The specific method of measuring a construct is called
the operational definition
0 For any construct, researchers can choose many
possible operational definitions
8. To Improve Construct Validity
of Measures
0 Measure learning directly (clear
operational definitions; learning is not the same as
enjoyment or perceived learning)
0 Measure student learning through
student learning objectives (ensure these
are aligned with assessments)
0 Use Established Scales to Measure
Student Attitudes and Personality
(Don’t reinvent the wheel; Tests in Print)
9. Good Measurement is Important
To Improve Construct Validity
0Know How To Score the Measure (make
sure you’ve established this before data collection;
know what is reasonable; IOTT; rubrics; training; IRR)
0Determine Whether to Use Graded or
Ungraded Measures (pros and cons of both)
0Minimize Participant and Researcher
Expectancies
10. To Improve Construct Validity
0Determine Whether to Use Multiple
Operational Definitions (can use multiple
measures)
0 Use a Retention Measure to
Investigate Long-term Effects (but treat
long term results with caution about other influences)
11. Good
Differences
between
Conditions
Improve
Construct
Validity
The treatment (intervention) needs to
be manipulated well to ensure
construct validity
The only difference between
conditions should be the treatment
Other variables that are different
between conditions are confounds
To determine construct
validity, treatments need specific
operational definitions
Anything that can affect the results and
cause a difference between students in
treatment and control conditions needs
to be documented
12. Potential
problems
in using
different
sections
of a class
Construct validity of the treatment is
questionable in any design that
compares one section of a class with
another
Classes are a social space, and the students
and instructors are interdependent
Students can ask different questions
The class may have a different “tone”
Splitting a class into two groups can
minimize this concern; if students in a split
class can be randomly assigned to a
condition, internal validity will increase
13. Different Types of Comparison in Research Design
Between
Participants
Within Participants:
Multiple Treatments
Within Participants:
Multiple Measures
How
comparison
works
Students in one condition
compared to students in
another condition (control
– Treatment; multiple T’s)
All students in both control
and treatment conditions
Students receive both pre-
test (control) and post-test
(treatment)
Strengths No carryover effects from
multiple treatments; no
instrumentation or testing
effects from multiple
assessments
No selection bias; greater
statistical power
No selection bias; greater
statistical power
Weaknesses Selection bias without
random assignment; many
differences if groups are
separate (e.g., two separate
classes); lower statistical
power
Instrumentation and testing
effects; carryover effects
Instrumentation and
testing effects; other
confounds that occur
between assessmens
Improve
Internal
Validity by:
Random assignment ;
adding covariates
Counterbalancing Increase number of
assessments; add no
treatment separate control
condition; use alternative
measures for assessment
14. External Validity
Can the sample
used in the study
generalize to
other groups or
populations?
Generally, it is
impossible in
classroom studies
to get a sample
that will
generalize to all
students.
The researcher
should report
demographic
chacteristics
How realistic is
the situation? In
a classroom, if
the treatment
works, external
validity is higher
15.
16. Common Practical Problems in
SoTL Research
Researchers who think they need to measure everything
Researchers who do not have many students: low statistical power
Researchers who only have a single class; limits to type of design
Difficulties in random assignment
Difficulties in determining whether the treatment is potent enough to
have an effect (see power above)
Concerns about conducting an ethical study in a classroom or training
situation
17. Don’t Use
Want to make statement
about causality
Have low number of
students
Use
Have single group of
students that cannot be
divided
Have only one session in
which to collect data
Additional Options:
Correlate many variables at the same time
Simple Correlation
18. One-Group, Post Test Only
Don’t Use
Want to make statement
about causality
Want to make comparison
to another group
Use
Desired focus is on
describing treatment and
not assessment
Cannot have pre-test or
control group
Want single group of
students that cannot be
divided
19. Two-Group, Post-Test Only
Don’t Use
Have low number of
students
Groups are very different
Have different assessments
for each condition
Use
Concerned about carryover
effects
Concerned about testing and
instrumentation effects
Have multiple groups
Have only one session to
collect data
Additional Options:
• Use random assignment to improve internal validity
• Add post-test to assess long-term change
• Add additional conditions
• Use covariates to improve internal validity and power
20. One Group, Pre-test, Post-test
Don’t Use
Items other than treatment
occur between assessments
First assessment affects
second
Students likely to change
between assessments with
no treatment
Use
Have low number of
students
Have single group that
cannot e divided
Cannot have control
condition
Additional Options:
• Add post-test to assess long-term change
• Use alternative measures to minimize testing and
instrumentation effects
21. Two-Group, Pre-test/Post-test
Don’t Use
Have single group of
students that cannot be
divided
Use
Have multiple groups
Additional Options:
• Use random assignment to improve internal validity
• Add post-test to assess long-term change
• Use alternative measures to minimize testing and
instrumentation effects
• Add additional conditions
• Use covariates to improve internal validity and power
22. Within Participants Design
Don’t Use
Early treatments affect
later treatments
Early assessments affect
later assessments
Use
Have low number of
students
Have single group that
cannot be divided
Additional Options:
• Add additional treatments
• Counterbalance conditions to improve internal validity
• Include pre-test to assess students before any treatment
23. Crossover Design
Don’t Use
First assessment, by itself,
affects second
Have single group of
students that cannot be
divided
Use
Have low number of
students
Have multiple groups
Additional Options:
• Include pre-test to assess before treatment
• Add post-test to examine long-term change
• Use random assignment to improve internal validity
• Use alternative measures to minimize testing and
instrumentation effects
24. Interrupted Time-Series
Design
Don’t Use
Have only one session to
collect data
Early assessments affect
later assessments
Use
Have low number of
students
Have single group that
cannot be divided
Want to determine long-
term effects
Additional Options:
• Add control condition to improve internal validity
• Add additional treatment condition, with treatment at
different time to improve internal validity
25. More Complex Designs
0Use Multiple Treatments to Investigate
Interactions (Interactions)
0Use Moderators to Determine When
Treatment Has Effect (Concept of ATI)
0Use Mediators to Investigate How
Treatment Has Effect (Mixed Method?)
26. Remember!
0Each design has advantages and
disadvantages
0Often, there is no clear right way, although
some designs will be better than others
0There is no single ideal study that eliminates
all potential problems and all alternative
hypotheses
0No one study can answer all of your questions!
Notes de l'éditeur
Many SOTL studies take form of: If I change teaching in X, what will be the impact on outcome Y. Change may be curriculum, instructional format, technology, time, etc. Outcome may be attitudes, motivation, enjoyment, learning – factual information, skills, problem solving, retention, etc.
Discuss causality as key consideration
Use existing measures if one matches your operational definition. Consider using an instrument used in similar studies. Increases body of knowledge and connectivity to prior research
Scoring and Pygmalion effect: Look at reported data that are not reasonable
Creating multiple opportunities to find differences, but within hypothesesNote methods hypothesized to increase retention, and thus availability for use and transfer; ABAB designs in special ed for behavior change investigation
Refers to Treatment Validity
Sample size and diversity issues, random sampling… Look at studies in the aggregate. Describe your sample so that it can be contextualized in the broader literature. What works in laboratory might not work in real classrooms. Is this an authentic context?